Does WinRAR detect duplicate files?

The new version of WinRAR, 5.00, has introduced the new RAR5 archive format and this feature is one of many improvements:

Save identical files as references

If this option is enabled, WinRAR analyzes the file contents before starting archiving. If several identical files larger than 64 KB are found, the first file in the set is saved as usual file and all following files are saved as references to this first file. It allows to reduce the archive size, but applies some restrictions to resulting archive. You must not delete or rename the first identical file in archive after the archive was created, because it will make extraction of following files using it as a reference impossible. If you modify the first file, following files will also have the modified contents after extracting. Extraction command must involve the first file to create following files successfully.

It is recommended to use this option only if you compress a lot of identical files, will not modify an archive later and will extract an archive entirely, without necessity to unpack or skip individual files. If all identical files are small enough to fit into compression dictionary, solid archiving can provide more flexible solution than this option.

Supported for RAR 5.0 archives only.

My quick test on a folder that contains 320,000 files (Baldur's Gate Trilogy with a lot of mods):

RAR4 compression method, compression set to "Store": 26.1 GB (28,053,815,768 bytes)

RAR5 compression method, compression set to "Store" and "Save identical files as references" turned on: 23.9 GB (25,722,664,097 bytes)

So I was able to save over 9% without using any compression at all!


If the files are really duplicates (or near duplicates), compression software can exploit that similarity across files to greatly increase the compression ratio. It's called Solid Compression. WinRAR and 7-Zip are 2 popular archivers that use it -- 7-Zip does by default. I'm not a RAR user so I can't tell you it's default configuration.

Common archivers on Linux/Unix/BSD systems also implicitly do solid compression by concatenating all the files together into a single file (most often via tar) before compressing that single file as a large block.

The one giant caveat to all this is that you don't really have any way of knowing exactly which files are similar, or how similar they are. It's not a good way of finding out what duplicate files you have, and extracting the archive is going to restore all that duplication. Which is, normally, exactly what one wants and expects with data compression -- to get back out exactly what was put into it.

If you want to clean up your folders, you need duplicate detection software. For normal collections, there's tons of software out there that ferrets out duplicate files. If you're dealing with media (audio, video, pictures), then you're going to want software that doesn't search for exact duplicates, but can fingerprint your files and find groups of files that are similar. That way, if you've got 2 copies of the same song with different tags or compressed slightly differently (say, a 128 Kb/s MP3 and a 256 Kb/s AAC) they can be identified. Or identifying 2 pictures of the same subject where one has been cropped or edited. Each media type often has specialized software for finding similar files, and there have been questions here before dealing with the particulars of each type. Of course, cleaning up such collections is much more difficult and time consuming because there's no fast and easy rules for deciding which file should be kept.


WinRAR will not do what you want. However, there are other tools that can find duplicated files inside a folder or in a partition. I have needed to do such a thing before, and I used Easy Duplicate Finder software:

Easy Duplicate Finder is a powerful tool to find and resolve duplicate photos, documents, spreadsheets, MP3's, and more! Removing duplicates will also help to speed up indexing and reduces back up size and time. Your computer isn’t fully optimized until you’ve removed all unnecessary duplicate files. Let Easy Duplicate Finder remove the duplicates!