finding matching files based on content

I have to write a script that compares two directories and finds duplicate files based on content not filename. I was thinking of using the diff command, diff -r dir1 dir2, but I get a ton of unwanted information, so my question is, what is the best way to find matching files based on the contents of the file not the name


Solution 1:

You could use a hash function like md5sum. If the hashes match, the files are identical.

Solution 2:

you can use the -s flag for diff :

diff -sqr dir1 dir2 | grep identical