How can I delete duplicates from a table based on a location id and name

Deleting duplicates is a common pattern. You apply a sequence number to all of the duplicates, then delete any that aren't first. In this case I order arbitrarily but you can choose to keep the one with the lowest PK value or that was modified last or whatever - just update the ORDER BY to sort the one you want to keep first.

;WITH cte AS 
(
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY Name, Location_ID ORDER BY @@SPID)
  FROM dbo.TableName
)
DELETE cte WHERE rn > 1;

Then to count, assuming there can't be two different Location_IDs for a given Location_Name (this is why schema + sample data is so helpful):

SELECT Location_Name, People = COUNT(Name)
  FROM dbo.TableName
  GROUP BY Location_Name;

Example db<>fiddle

If Location_Name and Location_ID are not tightly coupled (e.g. there could be Location_ID = 4, Location_Name = Place 1 and Location_ID = 4, Location_Name = Place 2 then you're going to have to define how to determine which place to display if you group by Location_ID, or admit that perhaps one of those columns is meaningless.

If Location_Name and Location_ID are tightly coupled, they shouldn't both be stored in this table. You should have a lookup/dimension table that stores both of those columns (once!) and you use the smallest data type as the key you record in the fact table (where it is repeated over and over again). This has several benefits:

Scanning the bigger table is faster, because it's not as wide
Storage is reduced because you're not repeating long strings over and over and over again
Aggregation is clearer and you can join to get names after aggregation, which will be faster
If you need to change a location's name, you only need to change it in exactly one place

How can I delete duplicates from a table based on a location id and name

Related

Recent Posts