PostgreSQL DISTINCT ON with different ORDER BY
I want to run this query:
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY purchases.purchased_at DESC
But I get this error:
PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Adding address_id
as first ORDER BY
expression silences the error, but I really don't want to add sorting over address_id
. Is it possible to do without ordering by address_id
?
Solution 1:
Documentation says:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
Official documentation
So you'll have to add the address_id
to the order by.
Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id
and that result sorted by purchased_at
then you're trying to solve a greatest N per group problem which can be solved by the following approaches:
The general solution that should work in most DBMSs:
SELECT t1.* FROM purchases t1
JOIN (
SELECT address_id, max(purchased_at) max_purchased_at
FROM purchases
WHERE product_id = 1
GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
A more PostgreSQL-oriented solution based on @hkf's answer:
SELECT * FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC
) t
ORDER BY purchased_at DESC
Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another
Solution 2:
You can order by address_id in an subquery, then order by what you want in an outer query.
SELECT * FROM
(SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC )
ORDER BY purchased_at DESC
Solution 3:
A subquery can solve it:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
) p
ORDER BY purchased_at DESC;
Leading expressions in ORDER BY
have to agree with columns in DISTINCT ON
, so you can't order by different columns in the same SELECT
.
Only use an additional ORDER BY
in the subquery if you want to pick a particular row from each set:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id
) p
ORDER BY purchased_at DESC;
If purchased_at
can be NULL
, use DESC NULLS LAST
- and match your index for best performance. See:
- Sort by column ASC, but NULL values first?
- Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Related, with more explanation:
- Select first row in each GROUP BY group?
- Sort by column ASC, but NULL values first?