How can I get all rows with keys provided in a list using SQLalchemy?
I have sequence of IDs I want to retrieve. It's simple:
session.query(Record).filter(Record.id.in_(seq)).all()
Is there a better way to do it?
Your code is absolutety fine.
IN
is like a bunch of X=Y
joined with OR
and is pretty fast in contemporary databases.
However, if your list of IDs is long, you could make the query a bit more efficient by passing a sub-query returning the list of IDs.
The code as is is completely fine. However, someone is asking me for some system of hedging between the two approaches of doing a big IN vs. using get() for individual IDs.
If someone is really trying to avoid the SELECT, then the best way to do that is to set up the objects you need in memory ahead of time. Such as, you're working on a large table of elements. Break up the work into chunks, such as, order the full set of work by primary key, or by date range, whatever, then load everything for that chunk locally into a cache:
all_ids = [<huge list of ids>]
all_ids.sort()
while all_ids:
chunk = all_ids[0:1000]
# bonus exercise! Throw each chunk into a multiprocessing.pool()!
all_ids = all_ids[1000:]
my_cache = dict(
Session.query(Record.id, Record).filter(
Record.id.between(chunk[0], chunk[-1]))
)
for id_ in chunk:
my_obj = my_cache[id_]
<work on my_obj>
That's the real world use case.
But to also illustrate some SQLAlchemy API, we can make a function that does the IN for records we don't have and a local get for those we do. Here is that:
from sqlalchemy import inspect
def get_all(session, cls, seq):
mapper = inspect(cls)
lookup = set()
for ident in seq:
key = mapper.identity_key_from_primary_key((ident, ))
if key in session.identity_map:
yield session.identity_map[key]
else:
lookup.add(ident)
if lookup:
for obj in session.query(cls).filter(cls.id.in_(lookup)):
yield obj
Here is a demonstration:
from sqlalchemy import Column, Integer, create_engine, String
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import random
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
data = Column(String)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
ids = range(1, 50)
s = Session(e)
s.add_all([A(id=i, data='a%d' % i) for i in ids])
s.commit()
s.close()
already_loaded = s.query(A).filter(A.id.in_(random.sample(ids, 10))).all()
assert len(s.identity_map) == 10
to_load = set(random.sample(ids, 25))
all_ = list(get_all(s, A, to_load))
assert set(x.id for x in all_) == to_load
If you use composite primary keys, you can use tuple_
, as in
from sqlalchemy import tuple_
session.query(Record).filter(tuple_(Record.id1, Record.id2).in_(seq)).all()
Note that this is not available on SQLite (see doc).