Between xz, gzip, and bzip2, which compression algorithim is the most efficient?

In my stress test, I compressed 464 megabytes of data using the three formats listed. Gzip returned a 364 MB file. Bzip2 returned a 315 MB file. Xz returned a 254 MB file. I also did a simple speed test:

Compression:

1: Gzip

2: Xz

3: Bzip2 (my fan was blowing quite a bit while this was going, indicating that my Athlon II was fairly strained)

Decompression:

1: Xz

2: Gzip

3: Bzip2

Please note that all of these tests were done with the latest version of 7-Zip.

Xz is the best format for well-rounded compression, while Gzip is very good for speed. Bzip2 is decent for its compression ratio, although xz should probably be used in its place.


I think that this article provides very interesting results.

http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

The most size efficient formats are xz and lzma, both with the -e parameter passed.

The fastest algorithm are by far lzop and lz4 which can produce a compression level not very far from gzip in 1.3 seconds while gzip took 8.1 second. The compression ratio is 2.8 for lz4 and 3.7 for gzip.

Here are a few results I extracted from this article :

  • Gzip : 8.1s @ 3.7

  • lz4 : 1.3s @ 2.8

  • xz : 32.2s @ 5.43

  • xz -e : 6m40 @ 7.063

  • xz : 4m51s @ 7.063

So if you really desperatly need speed, lz4 is awesome and still provides a 2.8 compression ratio.

If you desperatly need to spare the byte, xz at the maximum compression level (9) does the best job for text files like the kernel source. However, it is very long and takes a lot of memory.

An good one where needed to minimize the impact on time AND space is gzip. This is the one i would use to make manual daily backups of a production environment.


I did my own benchmark on 1.1GB Linux installation vmdk image:

rar    =260MB   comp= 85s   decomp= 5s
7z(p7z)=269MB   comp= 98s   decomp=15s
tar.xz =288MB   comp=400s   decomp=30s
tar.bz2=382MB   comp= 91s   decomp=70s
tar.gz =421MB   comp=181s   decomp= 5s

all compression levels on max, CPU Intel I7 3740QM, Memory 32GB 1600, source and destination on RAM disk

I Generally use rar or 7z for archiving normal files like documents.
and for archiving system files I use .tar.gz or .tar.xz by file-roller or tar with -z or -J options along with --preserve to compress natively with tar and preserve permissions (also alternatively .tar.7z or .tar.rar can be used)

update: as tar only preserve normal permissions and not ACLs anyway, also plain .7z plus backup and restoring permissions and ACLs manually via getfacl and sefacl can be used which seems to be best option for both file archiving or system files backup because it will full preserve permissions and ACLs, has checksum, integrity test and encryption capability, only downside is that p7zip is not available everywhere


The question is from 2014, but in the meantime there have been some trends. bzip2 has been made mostly obsolete by xz, and zstd is likely the best for most workflows.

  • Minimum file size: xz is still the best when it comes to minimal file sizes. Compression is fairly expensive though, so faster compression algorithms are better suited if that is a concern. The pxz implementation allows to use multi-core, which can speed up xz compression a bit.

  • Optimizing for fast compression: When it comes to the best algorithm when optimizing primarily for compression speed, there is no clear winner in my opinion but lz4 is a good candidate.

  • Best trade-off: If you need to pick a good overall algorithm without knowing too much about the scenario, then zstd shines. When configured to run at the same speed as gzip, it will easily beat it for size. With better compression rates, it gets closer to xz but at faster speeds. So, if you need a dependable algorithm for a broad set of use cases, zstd will most likely outperform the others. It also has some advanced features, like being able to build an external dictionary, so it can also be further optimized for specific domains.

  • Maximum compatibility: If you need an algorithm that any application will be able to understand, then gzip is still the best default. Compared to zstd, it is mostly obsolete now, but almost any environment will be able to work with gzip, while support for zstd is still not there (in 2021). It has been released in 2016 while gzip is from 1992.