Linux tools to find duplicate files?
I have a large and growing set of text files, which are all quite small (less than 100 bytes). I want to diff each possible pair of files and note which are duplicates. I could write a Python script to do this, but I'm wondering if there's an existing Linux command-line tool (or perhaps a simple combination of tools) that would do this?
Update (in response to mfinni comment): The files are all in a single directory, so they all have different filenames. (But they all have a filename extension in common, making it easy to select them all with a wildcard.)
Solution 1:
There's the fdupes. But I usually use a combination of find . -type f -exec md5sum '{}' \; | sort | uniq -d -w 36
Solution 2:
Well there is FSlint - which I haven't used for this particularly case, but I should be able to handle it: http://en.flossmanuals.net/FSlint/Introduction