SQLAlchemy: Counting multiple relationships - best way?
Imagine the following (example) datamodel:
class Organization(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
friendly_name = db.Column(db.Text, nullable=False)
users = db.relationship('Users', back_populates='organizations')
groups = db.relationship('Groups', back_populates='organizations')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="users")
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="groups")
(so basically an Organization has User and Group relationships)
What we want is to retrieve the counts for users and groups. Result should be similar to the following:
id | friendly_name | users_count | groups_count |
---|---|---|---|
1 | o1 | 33 | 3 |
2 | o2 | 12 | 2 |
3 | o3 | 1 | 0 |
This can be achieved with a query similar to
query = db.session.query(
Organization.friendly_name,
func.count(User.id.distinct()).label('users_count'),
func.count(Group.id.distinct()).label('groups_count'),
) \
.outerjoin(User, Organization.users) \
.outerjoin(Group, Organization.groups) \
.group_by(Organization.id)
which seems quite overkill. The first intuitive approach would be something like
query = db.session.query(
Organization.friendly_name,
func.count(distinct(Organization.users)).label('users_count'),
func.count(distinct(Organization.groups).label('groups_count'),
)# with or without outerjoins
which is not working (Note: With one relationship it would work).
a) Whats the difference between User.id.distinct()
and distinct(Organization.users)
in this case?
b) What would be the best/most performant/recommended way in SQLAlchemy to get a count for each relationship an Object has?
Bonus): If instead of Organization.friendly_name
the whole Model would be selected (...query(Organization, func....)
) SQLAlchemy returns a tuple with the format t(Organization, users_count, groups_count) as result. Is there a way to just return the Organization with the two counts as additional fields? (as SQL would)
Solution 1:
b:
You can try a window function to count users and groups with good performance:
query = db.session.query(
Organization.friendly_name,
func.count().over(partition_by=(User.id, Organization.id)).label('users_count')
func.count().over(partition_by=(Group.id, Organization.id)).label('groups_count')
)
.outerjoin(User, Organization.users)
.outerjoin(Group, Organization.groups)
bonus:
To return count as a field of Organization
, you can use hybrid_property, but you would not be happy with the performance.