Deleting duplicate rows from sqlite database

I have a huge table - 36 million rows - in SQLite3. In this very large table, there are two columns:

  • hash - text
  • d - real

Some of the rows are duplicates. That is, both hash and d have the same values. If two hashes are identical, then so are the values of d. However, two identical d's does not imply two identical hash'es.

I want to delete the duplicate rows. I don't have a primary key column.

What's the fastest way to do this?


You need a way to distinguish the rows. Based on your comment, you could use the special rowid column for that.

To delete duplicates by keeping the lowest rowid per (hash,d):

delete   from YourTable
where    rowid not in
         (
         select  min(rowid)
         from    YourTable
         group by
                 hash
         ,       d
         )

I guess the fastest would be to use the very database for it: add a new table with the same columns, but with proper constraints (a unique index on hash/real pair?), iterate through the original table and try to insert records in the new table, ignoring constraint violation errors (i.e. continue iterating when exceptions are raised).

Then delete the old table and rename the new to the old one.