How get the T-SQL code to find duplicates?

Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

However, here's how to build a SQL to get duplicates over a set of columns:

SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1

This will find rows which, for columns col1-col4, has the same combination of values, more than once.

For instance, in the following table, rows 2+3 would be duplicates:

PK    col1    col2    col3    col4    col5
1       1       2       3       4      6
2       1       3       4       7      7
3       1       3       4       7      10
4       2       3       1       4      5

The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.

If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:

SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table

Youd can also delete (or otherwise work with) duplicates using this technique:

WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1

ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx

How get the T-SQL code to find duplicates?

Related

Recent Posts