Is there any way to find similar files (not duplicates)?

Simian does this for the source code of some languages. It is best at finding blatant copy-n-paste coding. Its developments seems to have stalled, but it works good enough.


(For Windows)

The product Anti-Twin (free for private use) claims to be able to do this:

If you want Anti-Twin not only to search for full duplicates but also to similar files, you can reduce the desired minimum match from the default value of 100% to up to 60%. This function has been particularly designed for the search of almost identical files where only a tiny detail was changed. Anti-Twin uses the similarity search as soon as you enter a value below 100%. The similarity comparison takes much longer than the 100% full duplicate search!

Unfortunately, the similarity search as part of the byte-by-byte comparison only makes sense for a few file types, because a similarity can only be detected if the files are uncompressed and unencrypted. Uncompressed files are e.g. unformatted texts (.TXT) and HTML.