finding matching files based on content
I have to write a script that compares two directories and finds duplicate files based on content not filename. I was thinking of using the diff command, diff -r dir1 dir2, but I get a ton of unwanted information, so my question is, what is the best way to find matching files based on the contents of the file not the name
Solution 1:
You could use a hash function like md5sum
. If the hashes match, the files are identical.
Solution 2:
you can use the -s flag for diff :
diff -sqr dir1 dir2 | grep identical