Sqlalchemy - Difference between query and query.all in for loops

I would like to ask whats the difference between

for row in session.Query(Model1):
    pass

and

for row in session.Query(Model1).all():
    pass

is the first somehow an iterator bombarding your DB with single queries and the latter "eager" queries the whole thing as a list (like range(x) vs xrange(x)) ?

Nope, there is no difference in DB traffic. The difference is just that for row in session.Query(Model1) does the ORM work on each row when it is about to give it to you, while for row in session.Query(Model1).all() does the ORM work on all rows, before starting to give them to you.

Note that q.all() is just sugar for list(q), i.e. collecting everything yielded by the generator into a list. Here is the source code for it, in the Query class (find def all in the linked source):

def all(self):
    """Return the results represented by this ``Query`` as a list.

    This results in an execution of the underlying query.

    """
    return list(self)

... where self, the query object, is an iterable, i.e. has an __iter__ method.

So logically the two ways are exactly the same in terms of DB traffic; both end up calling query.__iter__() to get a row iterator, and next()ing their way through it.

The practical difference is that the former can start giving you rows as soon as their data has arrived, “streaming” the DB result set to you, with less memory use and latency. I can't state for sure that all the current engine implementations do that (I hope they do!). In any case the latter version prevents that efficiency, for no good reason.

Actually, the accepted response is not true (or at least not anymore if ever true), specifically the following statements are false:

(1) The difference is just that for row in session.Query(Model1) does the ORM work on each row when it is about to give it to you, while for row in session.Query(Model1).all() does the ORM work on all rows, before starting to give them to you.

SQLAlchemy will always ORM map all rows no matter which of the 2 options you choose to use. This can be seen in their source code within these lines; The loading.instances method will indeed return a generator but one of already mapped ORM instances; you can confirm this in the actually generator looping code:

for row in rows: #  ``rows`` here are already ORM mapped rows
    yield row

So by the time the first run of the generator has finished and an instance has been yielded, all instances have been ORM mapped. (Exemplified below (1))

(2) The practical difference is that the former can start giving you rows as soon as their data has arrived, “streaming” the DB result set to you, with less memory use and latency.

As seen explained above this is also false as all data retrieved from the DB will be processed/mapped before any yielding is done.

The comments are also misleading. Specifically:

The database hit may be the same, but please bear in mind that all will load the entire result set into memory. This could be gigabytes of data

The entire result set will be loaded regardless of using .all() or not. Clearly seen in this line. This is also very easy to test/verify.

The only difference I can see from the 2 options stated is that by using .all you will be looping twice through the results (if you want to process all instances/rows) as the generator is iterated/exhausted first by the call to list(self) to transform it into a list.

Since the SQLAlchemy code is not easy to digest I wrote a short snippet to exemplify the all this:

class Query:
    def all(self):
        return list(self)

    def instances(self, rows_to_fetch=5):
        """ORM instance generator"""
        mapped_rows = []
        for i in range(rows_to_fetch):
            # ORM mapping work here as in lines 81-88 from loading.instances
            mapped_rows.append(i)
        print("orm work finished for all rows")
        for row in mapped_rows:  # same as ``yield from mapped_rows``
            print("yield row")
            yield row

    def __iter__(self):
        return self.instances()


query = Query()
print("(1) Generator scenario:")
print("First item of generator: ", next(iter(query)))

print("\n(2) List scenario:")
print("First item of list: ", query.all()[0])

"""
RESULTS:
--------
(1) Generator scenario:
orm work finished for all rows
yield row
First item of generator:  0

(2) List scenario:
orm work finished for all rows
yield row
yield row
yield row
yield row
yield row
First item of list:  0
"""

In order to have any finer control when processing large result sets one would have to use something like yield_per for example and still this would not be a one by one case scenario but rather performing the ORM mapping by batches of instances.

Worth to note the only comment that was indeed on point and can be easily overlooked (hence the writing of this answer) pointing to an explanation from Michael Bayer.

Sqlalchemy - Difference between query and query.all in for loops

Related

Recent Posts