SQLAlchemy - subquery in a WHERE clause
Solution 1:
This should work (different SQL, same result):
t = Session.query(
Posts.user_id,
func.max(Posts.post_time).label('max_post_time'),
).group_by(Posts.user_id).subquery('t')
query = Session.query(User, Posts).filter(and_(
User.user_id == Posts.user_id,
User.user_id == t.c.user_id,
Posts.post_time == t.c.max_post_time,
))
for user, post in query:
print user.user_id, post.post_id
Where c stands for 'columns'
Solution 2:
the previous answer works, but also the exact sql you asked for is written much as the actual statement:
print s.query(User, Posts).\
outerjoin(Posts.user).\
filter(Posts.post_time==\
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User).
as_scalar()
)
I guess the "concept" that isn't necessarily apparent is that as_scalar() is currently needed to establish a subquery as a "scalar" (it should probably assume that from the context against ==).
Edit: Confirmed, that's buggy behavior, completed ticket #2190. In the current tip or release 0.7.2, the as_scalar() is called automatically and the above query can be:
print s.query(User, Posts).\
outerjoin(Posts.user).\
filter(Posts.post_time==\
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User)
)
Solution 3:
It is usually expressed similarly to the actual SQL - you create a subquery that returns single result and compare against that - however what sometimes can be real pain is if you have to use a table in the subquery that you are already querying or joining on.
Solution is to create an aliased version of the model to reference in the subquery.
So let's say you are already operating in a connection where you have an existing Posts
model
and some basic query
ready - now, you'd want to query for the list of latest (single) post from each user, you'd filter the query like:
from sqlalchemy.orm import aliased
posts2 = aliased(Posts) # create aliased version
query = query.filter(
model.post_id
==
Posts.query # create query directly from model, NOT from the aliased version!
.with_entities(posts2.post_id) # only select column "post_id"
.filter(
posts2.user_id == model.user_id
)
.order_by(posts2.post_id.desc()) # assume higher id == newer post
.limit(1) # we must limit to a single row so we only get 1 value
)
I've purposedly did not use the func.max
because I consider that a simpler version and it's already in other answers, this example I think will be useful to people that generally find this question because they are looking for a solution how to subquery the same table.