More efficient file compression program for many identical files?

7-zip supports solid compression if I remember correctly, so it should compress a lot of nearly identical files very well.


I did some testing on the aspect of "identical files", as mentioned in the question, using 7-zip (version 9.20), as no one gave an elaborate answer on that, yet. This gave some interesting results. I tested with 10 copies of the file that this sites uses for its page-not-found message. This file won't compress very well as an individual file, being a jpg-file. So, it will demonstrate the efficiency of compressing multiple identical files. Its file size is 37 KB.

  1. When I compress all ten copies, using to zip-format, the file size is 367 KB, with a compressed size of about 99% of the original total size of all 10 files.
  2. When I compress all ten copies, using to 7z-format, the file size is 37 KB, with a compressed size of about 101% of just one of the original files.
  3. If I first put 5 copies in a 7-z archive, then add 3 and finally 2 copies in separate steps, the file size becomes 111 KB, about three times the size of a single original file.

If I open the 3rd archive, one of the properties is Block. This lists 0, 1 and 2 for 3, 5 and 2 of the files, respectively.

Observations:

  1. The zip-format will compress each file individually, not benefiting from the possible to efficiently compress identical files.
  2. The 7z-format will efficiently compress multiple identical files, as long as they are added to the archive in one step.

Conclusions:

  1. For optimal compressions of files, use 7z rather than zip.
  2. Compression may improve dramatically, if you do not add files to an existing 7z-archive, but first decompress it and that compress it again, including the new files, in one step.

Windows Vista comes with Backup and Restore Center. It will do incremental backups of your files to avoid wasting space and having to create multiple backups. From the linked page:

Previously backed-up versions of files use only a bare minimum of disk space. If only a small part of a file changes (such as one slide in a presentation), only that portion gets tracked and saved.


7-zip has one of the best compression algorithm around. I don't believe there's currently anything that beats 7-zip in compression (algorithm) so far.