How can I improve this query without HAVING Clause

For big tables it's extremely expensive to join and aggregate all rows before finally filtering the few qualifying rows.

It should be faster by orders of magnitude to filter qualifying documents first, then grab all documents for the same user(s), aggregate, and finally join to users:

SELECT u.username, d.docs
FROM  (
   SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
   FROM   documents d1
   JOIN   documents d USING (user_id)
   WHERE  d1.name = 'doc1'
   AND    d1.value = 2
   -- AND    d.name  IS NOT NULL  -- strip NULLs early 
   -- AND    d.value IS NOT NULL  -- if not defined NOT NULL anyway
   GROUP  BY 1
   ) d
JOIN   users u ON u.id = d.user_id;

While being at it, I removed jsonb_strip_nulls() since all processed columns are defined NOT NULL anyway. Also cheaper.

Possibly simplify to just jsonb_build_object(d.name, d.value).

For the first step, an index on documents(name, value) will help a lot. Maybe even on documents(name, value, user_id) to get index-only scans (depends).

It should be safe to assume there is also an index on documents(user_id). Helps the next step. Again, documents(user_id, name, value) for index-only scans.

And, finally, an index on users(id). Should be a given. Again, users(id, username) for index-only scans.

If (name, value) is not UNIQUE per user (like it seems to be the case), use EXISTS instead to avoid duplicates:

SELECT u.username, d.docs
FROM  (
   SELECT user_id, jsonb_agg(jsonb_build_object('name', d.name, 'value', d.value)) AS docs
   FROM   documents d
   WHERE  EXISTS (
      SELECT FROM documents d1
      WHERE  d1.user_id = d.user_id
      AND    d1.name = 'doc1'
      AND    d1.value = 2
      )
   GROUP  BY 1
   ) d
JOIN   users u ON u.id = d.user_id;

Similar query plan, same indexes can be used.

Related:

  • Query with LEFT JOIN not returning rows for count of 0

  • Is a composite index also good for queries on the first field?