How to find duplicate records in PostgreSQL

I have a PostgreSQL database table called "user_links" which currently allows the following duplicate fields:

year, user_id, sid, cid

The unique constraint is currently the first field called "id", however I am now looking to add a constraint to make sure the year, user_id, sid and cid are all unique but I cannot apply the constraint because duplicate values already exist which violate this constraint.

Is there a way to find all duplicates?

The basic idea will be using a nested query with count aggregation:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

You can adjust the where clause in the inner query to narrow the search.

There is another good solution for that mentioned in the comments, (but not everyone reads them):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

Or shorter:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

From "Find duplicate rows with PostgreSQL" here's smart solution:

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
  FROM tbl
) dups
where 
dups.Row > 1

In order to make it easier I assume that you wish to apply a unique constraint only for column year and the primary key is a column named id.

In order to find duplicate values you should run,

SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);

Using the sql statement above you get a table which contains all the duplicate years in your table. In order to delete all the duplicates except of the the latest duplicate entry you should use the above sql statement.

DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;

How to find duplicate records in PostgreSQL

Related

Recent Posts