How can I improve this query without HAVING Clause
For big tables it's extremely expensive to join and aggregate all rows before finally filtering the few qualifying rows.
It should be faster by orders of magnitude to filter qualifying documents first, then grab all documents for the same user(s), aggregate, and finally join to users:
SELECT u.username, d.docs
FROM (
SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
FROM documents d1
JOIN documents d USING (user_id)
WHERE d1.name = 'doc1'
AND d1.value = 2
-- AND d.name IS NOT NULL -- strip NULLs early
-- AND d.value IS NOT NULL -- if not defined NOT NULL anyway
GROUP BY 1
) d
JOIN users u ON u.id = d.user_id;
While being at it, I removed jsonb_strip_nulls()
since all processed columns are defined NOT NULL anyway. Also cheaper.
Possibly simplify to just jsonb_build_object(d.name, d.value)
.
For the first step, an index on documents(name, value)
will help a lot. Maybe even on documents(name, value, user_id)
to get index-only scans (depends).
It should be safe to assume there is also an index on documents(user_id)
. Helps the next step. Again, documents(user_id, name, value)
for index-only scans.
And, finally, an index on users(id)
. Should be a given. Again, users(id, username)
for index-only scans.
If (name, value)
is not UNIQUE
per user (like it seems to be the case), use EXISTS
instead to avoid duplicates:
SELECT u.username, d.docs
FROM (
SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
FROM documents d
WHERE EXISTS (
SELECT FROM documents d1
WHERE d1.user_id = d.user_id
AND d1.name = 'doc1'
AND d1.value = 2
)
GROUP BY 1
) d
JOIN users u ON u.id = d.user_id;
Similar query plan, same indexes can be used.
Related:
-
Query with LEFT JOIN not returning rows for count of 0
-
Is a composite index also good for queries on the first field?