How do I find and delete duplicate music tracks?

Solution 1:

You can use fdupes like the answer for question »How to find and delete duplicate files« suggested. Let me give an example:

mkdir -p "Music/Prefuse 73/One Word Extinguisher/"
dd if=/dev/urandom of=Music/Prefuse\ 73/One\ Word\ Extinguisher/07.Detchibe.mp3 bs=1023 count=2048
  2048+0 records in
  2048+0 records out
  2095104 bytes (2.1 MB) copied, 0.379806 s, 5.5 MB/s
cp Music/Prefuse\ 73/One\ Word\ Extinguisher/07.Detchibe.mp3 Music/Prefuse\ 73/One\ Word\ Extinguisher/"07 - Detchibe.mp3"
fdupes -rd .
  [1] ./Music/Prefuse 73/One Word Extinguisher/07.Detchibe.mp3
  [2] ./Music/Prefuse 73/One Word Extinguisher/07 - Detchibe.mp3

  Set 1 of 1, preserve files [1 - 2, all]:

First I created the directory like in your example. The I made a file from random data and copied its contents to another files. When I run fdupes -rd the software finds the two exact files and asks which one to delete.

If you have lots of files, you can use the option -1. fdupes will print all duplicates on a single line. You can process them with xargs and other shell features.

Solution 2:

I found a somewhat simple command chain. Much thanks to @Oli.

fdupes -rf --quiet ~/Desktop/Dupes2/ | while read i; do mv "$i" ~/Desktop/Dupes/ ; done

This used fdupes to recursively (-r) find the dupes, omitting the first (-f). Bash reads this line by line through read amd hands each line to mv to move all duplicates to another directory. Note the use of quotes in the while loop to handle spaces and other dodgy punctuation that fdupes will not handle (even with -1/--sameline).

Solution 3:

In the answers to Manually set track listen count in Banshee?, it describes how to get at the database that banshee uses to save all track information.

Once you're connected to the database, on the execute query table, paste

select tweaked_track, count(*) from 
  (select replace(replace(replace(title, ' ', ''), '-', ''), '.', '') as tweaked_track 
   from coretracks) 
group by tweaked_track 
order by 2, 1 desc;

into the SQL string box, then click 'execute query'. This will show you all the tracks you have with the same title ignoring spaces, dashes, and periods. If there are other characters you want to ignore, add them to the query in the same pattern. (IE add replace( before the first existing "replace" and after the last ")" on that line, add , '[character you want removed]', '').

(I don't know how much you know about sql - if you need more details, post a comment.)

This will give you a list of titles. You will have to actually do the delete yourself.

There may be a better way of doing this, but if there is, I don't know about it.

Once you have a big list of files to be deleted (either from my method or from fdupes like others have mentioned), put the list of files you want to delete into a text directory. Make sure one of the following is true:

Option #1: The filenames contain full path. For example the file might contain:

/home/doneill/music/weird_al/duped_file.mp3
/home/doneill/music/weird_al/another_dupe.mp3
/home/doneill/music/bach/baroque_dupe.mp3

Option #2: The filenames contain relative path, and the file with the list of filenames is saved in the parent folder. For example, if your file list was saved in /home/doneill/music/, it would contain:

weird_al/duped_file.mp3
weird_al/another_dupe.mp3
bach/baroque_dupe.mp3

In either case, open up a terminal window, and change to the folder that contains the file with the list cd /home/doneill/music/ for example.

Type in:

for a in `cat filelist.txt`; do echo $a; done

(Replacing filelist.txt with the name of the file with the list). This should spit out a list of all the files you want to delete. Take a moment to double check the list. If it is right, type:

for a in `cat filelist.txt`; do rm $a; done

This basically tells your computer: for each line in the file filelist.txt, remove a file with the name listed.