What is the fastest compression method for a large number of files?

I need to compress a directory with around 350,000 fairly small files that amount to about 100GB total. I am using OSX and am currently using the standard "Compress" tool that converts this directory into a .zip file. Is there a faster way to do this?


Solution 1:

For directories I'd use a tar piped to bzip2 with max-compression.

a simple way to go is,

tar cfj archive.tar.bz2 dir-to-be-archived/ 

This works great if you don't intend to fetch small sets of files out of the archive
and are just planning to extract the whole thing whenever/wherever required.
Yet, if you do want to get a small set of files out, its not too bad.

I prefer to call such archives filename.tar.bz2 and extract with the 'xfj' option.

The max-compression pipe looks like this,

tar cf - dir-to-be-archived/ | bzip2 -9 - > archive.tar.bz2  
#      ^pipe tarball from here to zip-in^ into the archive file. 

Note: the 'bzip2' method and more compression tends to be slower than regular gzip from 'tar cfz'.

If you have a fast network and the archive is going to be placed on a different machine,
you can speed up with a pipe across the network (effectively using two machines together).

tar cf - dir/ | ssh user@server "bzip2 -9 - > /target-path/archive.tar.bz2"  
#      ^ pipe tarball over network to zip ^ and archive on remote machine.

Some references,

  1. Linux Journal: Compression Tools Compared, Jul 28, 2005
    • this also refers the MaximumCompression site mentioned by Dennis
  2. gzip vs. bzip2, Aug 26, 2003
  3. A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA, May 31, 2005

Solution 2:

This guy did some research on that. It appears that .zip will compress larger files faster. However, it yields one of the largest compression sizes. It also looks like he was using Windows utilities, but I'm betting OSX's utility is almost as optimized.

Here is an excellent website where numerous compression utilities have been benchmarked for speed over many files. There are many other tests on that site you could look at to determine the best utility for you.

Much of the speed has to do with the program you use. I've used 7zip's utility for Windows, and I find that to be very fast. However, compressing many files takes a long time no matter what so I would just let it go overnight. Or you could just tar the whole thing and not compress it...Personally I hate unzipping large archives so I would be careful if that's what you want to do.