Finding and removing duplicate files in osx with a script

From: http://www.chriswrites.com/2012/02/how-to-find-and-delete-duplicate-files-in-mac-os-x/ How do I modify this to only delete the first version of the file it sees.

Open Terminal from Spotlight or the Utilities folder Change to the directory (folder) you want to search from (including sub-folders) using the cd command. At the command prompt type cd for example cd ~/Documents to change directory to your home Documents folder At the command prompt, type the following command:

find . -size 20 \! -type d -exec cksum {} \; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

This method uses a simple checksum to determine whether files are identical. The names of duplicate items will be listed in a file named duplicates.txt in the current directory. Open this to view the names of identical files There are now various ways to delete the duplicates. To delete all the files in the text file, at the command prompt type:

while read file; do rm "$file"; done < duplicates.txt

Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

(Note: for testing purposes in my machine I used find . -type f -exec cksum {} \;)

Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:

rm /tmp/f2.tmp
touch /tmp/f2.tmp

Hope this helps =)

Another option is to use fdupes:

brew install fdupes
fdupes -r .

fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.

I wrote a script that renames your files to match a hash of their contents.

It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:

3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg

This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.

Script: https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

enter image description here

Finding and removing duplicate files in osx with a script

Related

Recent Posts