How do I find duplicates across multiple columns?
Solution 1:
Duplicated id
for pairs name
and city
:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
Solution 2:
SELECT name, city, count(*) as qty
FROM stuff
GROUP BY name, city HAVING count(*)> 1
Solution 3:
Something like this will do the trick. Don't know about performance, so do make some tests.
select
id, name, city
from
[stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)
Solution 4:
Using count(*) over(partition by...)
provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:
SELECT
t.*
FROM (
SELECT
s.*
, COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
FROM stuff s
) t
WHERE t.qty > 1
ORDER BY t.name, t.city
While most recent RDBMS versions support count(*) over(partition by...)
MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)
CREATE TABLE stuff( id INTEGER NOT NULL ,name VARCHAR(60) NOT NULL ,city VARCHAR(60) NOT NULL );
INSERT INTO stuff(id,name,city) VALUES (904834,'jim','London') , (904835,'jim','London') , (90145,'Fred','Paris') , (90132,'Fred','Paris') , (90133,'Fred','Paris') , (923457,'Barney','New York') # not expected in result ;
SELECT t.* FROM ( SELECT s.* , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty FROM stuff s ) t WHERE t.qty > 1 ORDER BY t.name, t.city
id | name | city | qty -----: | :--- | :----- | --: 90145 | Fred | Paris | 3 90132 | Fred | Paris | 3 90133 | Fred | Paris | 3 904834 | jim | London | 2 904835 | jim | London | 2
db<>fiddle here
Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.
Solution 5:
A little late to the game on this post, but I found this way to be pretty flexible / efficient
select
s1.id
,s1.name
,s1.city
from
stuff s1
,stuff s2
Where
s1.id <> s2.id
and s1.name = s2.name
and s1.city = s2.city