How to find duplicates in 2 columns not 1

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value.

stone_id can have duplicates as long as for each upsharge title is different, and in reverse. But say for example stone_id = 412 and upcharge_title = "sapphire" that combination should only occur once.

This is ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "ruby"

This is NOT ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "sapphire"

Is there a query that will find duplicates in both fields? And if possible is there a way to set my data-base to not allow that?

I am using MySQL version 4.1.22


Solution 1:

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row.

As far as finding the existing duplicates try this:

select   stone_id,
         upcharge_title,
         count(*)
from     your_table
group by stone_id,
         upcharge_title
having   count(*) > 1

Solution 2:

I found it helpful to add a unqiue index using an "ALTER IGNORE" which removes the duplicates and enforces unique records which sounds like you would like to do. So the syntax would be:

ALTER IGNORE TABLE `table` ADD UNIQUE INDEX(`id`, `another_id`, `one_more_id`);

This effectively adds the unique constraint meaning you will never have duplicate records and the IGNORE deletes the existing duplicates.

You can read more about eh ALTER IGNORE here: http://mediakey.dk/~cc/mysql-remove-duplicate-entries/

Update: I was informed by @Inquisitive that this may fail in versions of MySql> 5.5 :

It fails On MySQL > 5.5 and on InnoDB table, and in Percona because of their InnoDB fast index creation feature [http://bugs.mysql.com/bug.php?id=40344]. In this case first run set session old_alter_table=1 and then the above command will work fine

Update - ALTER IGNORE Removed In 5.7

From the docs

As of MySQL 5.6.17, the IGNORE clause is deprecated and its use generates a warning. IGNORE is removed in MySQL 5.7.

One of the MySQL dev's give two alternatives:

  • Group by the unique fields and delete as seen above
  • Create a new table, add a unique index, use INSERT IGNORE, ex:
CREATE TABLE duplicate_row_table LIKE regular_row_table;
ALTER TABLE duplicate_row_table ADD UNIQUE INDEX (id, another_id);
INSERT IGNORE INTO duplicate_row_table SELECT * FROM regular_row_table;
DROP TABLE regular_row_table;
RENAME TABLE duplicate_row_table TO regular_row_table;

But depending on the size of your table, this may not be practical

Solution 3:

You can find duplicates like this..

Select
    stone_id, upcharge_title, count(*)
from 
    particulartable
group by 
    stone_id, upcharge_title
having 
    count(*) > 1

Solution 4:

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place.

ALTER TABLE table
    ADD UNIQUE(stone_id, charge_title)

(This is valid T-SQL. Not sure about MySQL.)

Solution 5:

To find the duplicates:

select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1

To constrain to avoid this in future, create a composite unique key on these two fields.