Multiple DISTINCT ON clauses in PostgreSQL
Is it possible to select rows that are DISTINCT ON
some separate, independent sets of columns?
Suppose I want all the rows which match the following conditions:
- distinct on
(name, birth)
- distinct on
(name, height)
So that, out of the following table, the rows marked with a red cross would not be distinct (with an indication of the failing clause):
name birth height
--------------------------
William 1976 1.82
James 1981 1.68
Mike 1976 1.68
Tom 1967 1.79
William 1976 1.74 ❌ (name, birth)
William 1981 1.82 ❌ (name, height)
Tom 1978 1.92
Mike 1963 1.68 ❌ (name, height)
Tom 1971 1.86
James 1981 1.77 ❌ (name, birth)
Tom 1971 1.89 ❌ (name, birth)
In the above example, if the DISTINCT ON
clause had just been DISTINCT ON (name, birth, height)
, then all the rows would have been considered distinct.
Tried and didn't work:
SELECT DISTINCT ON (name, birth) (name, height) ...
SELECT DISTINCT ON (name, birth), (name, height) ...
SELECT DISTINCT ON ((name, birth), (name, height)) ...
SELECT DISTINCT ON (name, birth) AND (name, height) ...
SELECT DISTINCT ON (name, birth) AND ON (name, height) ...
SELECT DISTINCT ON (name, birth) DISTINCT ON (name, height) ...
SELECT DISTINCT ON (name, birth), DISTINCT ON (name, height) ...
As commented, there is ambiguity in the question. The number of result rows can differ for every call. If you are satisfied with arbitrary results, @klin's solution is good enough. Else, you need to define requiremens more closely. Like:
distinct on (name, birth)
, pick smallest height first, then smallest ID as tiebreaker
Or:
distinct on (name, height)
, pick earliest birth first, then smallest ID as tiebreaker
Your table should have a primary key (or some way to identify rows uniquely):
CREATE TEMP TABLE tbl (
tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);
INSERT INTO tbl (name, birth, height)
VALUES
('William', 1976, 1.82)
, ('James', 1981, 1.68)
, ('Mike', 1976, 1.68)
, ('Tom', 1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom', 1978, 1.92)
, ('Mike', 1963, 1.68)
, ('Tom', 1971, 1.86)
, ('James', 1981, 1.77)
, ('Tom', 1971, 1.89);
Query:
SELECT DISTINCT ON (name, height) *
FROM (
SELECT DISTINCT ON (name, birth) *
FROM tbl
ORDER BY name, birth, height, tbl_id -- pick smallest height, ID as tiebreaker
) sub
ORDER BY name, height, birth, tbl_id; -- pick earliest birth, ID as tiebreaker
tbl_id | name | birth | height
--------+---------+-------+--------
2 | James | 1981 | 1.68
8 | Mike | 1963 | 1.68
4 | Tom | 1967 | 1.79
9 | Tom | 1971 | 1.86
7 | Tom | 1978 | 1.92
5 | William | 1976 | 1.74
6 | William | 1981 | 1.82
(7 rows) -- !!!
A DISTINCT ON
query without deterministic ORDER BY
can return any arbitrary row from each set of dupes. Applied once, you still get a deterministic number of rows (with arbitrary picks). Applied repeatedly, the resulting number of rows is arbitrary, too. Related:
- Select first row in each GROUP BY group?
Use a derived table:
with my_table(name, birth, height) as (
values
('William', 1976, 1.82),
('James', 1981, 1.68),
('Mike', 1976, 1.68),
('Tom', 1967, 1.79),
('William', 1976, 1.74), -- ? (name, birth)
('William', 1981, 1.82), -- ? (name, height)
('Tom', 1978, 1.92),
('Mike', 1963, 1.68), -- ? (name, height)
('Tom', 1971, 1.86),
('James', 1981, 1.77), -- ? (name, birth)
('Tom', 1971, 1.89) -- ? (name, birth)
)
select distinct on (name, height) *
from (
select distinct on (name, birth) *
from my_table
) s
name | birth | height
---------+-------+--------
James | 1981 | 1.68
Mike | 1963 | 1.68
Tom | 1967 | 1.79
Tom | 1971 | 1.89
Tom | 1978 | 1.92
William | 1976 | 1.82
(6 rows)