Search file duplicates in OSX by hash

You might also use fdupes. It doesn't have an option to search for duplicates of a specific file, but you can just grep the output for the filename:

fdupes -r1 .|grep filename

-r recurses into directories and -1 prints each group of duplicate files on a single line.

Other useful examples:

fdupes -r . finds all duplicate files under the current directory;

fdupes -r . -dN deletes all except the first duplicate from each group of duplicates;

fdupes -r dir1 dir2|grep dir1/|xargs rm removes duplicates in dir1.

You can install fdupes with brew install fdupes.

You can easily build this yourself with some shell commands:

find ~ -type f -exec md5 -r '{}' \; > /tmp/md5.list

will build a list of md5 hashes over all your files.
grep $(md5 -q FILE-TO-SEARCH) /tmp/md5.list

will search for the md5 hash of FILE-TO-SEARCH

Running the first command (especially if you run it across the whole disc) will take a long time though.

If you only want to search for one file, you can also use

SIZE=$(stat -f '%z' FILE-TO-SEARCH)
MD5=$(md5 -q FILE-TO-SEARCH)
find ~ -type f -size ${SIZE}c | while read f; do
    [[ $MD5 = $(md5 -q "$f") ]] && echo $f
done

This should work if you substitute the size and hash for FILE_001 into the command.

198452 bytes is the file size I used and the file md5 hash is 3915dc84b4f464d0d550113287c8273b

find . -type f -size 198452c -exec md5 -r {} \; |
    grep -o "3915dc84b4f464d0d550113287c8273b\ \(.*\)" | awk '{print $2}'

The output will be a list of files with path names relative to the directory sent to the find command.

This approach has the advantage that it will only hash files that match the size of your original and will only output file names that match the hash.

Search file duplicates in OSX by hash

Related

Recent Posts