Multi-Core Compression tools
Well, the keyword was parallel. After looking for all compression tools that were also parallel I found the following:
PXZ - Parallel XZ is a compression utility that takes advantage of running LZMA compression of different parts of an input file on multiple cores and processors simultaneously. Its primary goal is to utilize all resources to speed up compression time with minimal possible influence on compression ratio.
sudo apt-get install pxz
PLZIP - Lzip is a lossless data compressor based on the LZMA algorithm, with very safe integrity checking and a user interface similar to the one of gzip or bzip2. Lzip decompresses almost as fast as gzip and compresses better than bzip2, which makes it well suited for software distribution and data archiving.
Plzip is a massively parallel (multi-threaded) version of lzip using the lzip file format; the files produced by plzip are fully compatible with lzip.
Plzip is intended for faster compression/decompression of big files on multiprocessor machines, which makes it specially well suited for distribution of big software files and large scale data archiving. On files big enough, plzip can use hundreds of processors.
sudo apt-get install plzip
PIGZ - pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that takes advantage of multiple processors and multiple cores when compressing data.
sudo apt-get install pigz
PBZIP2 - pbzip2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 (ie: anything compressed with pbzip2 can be decompressed with bzip2).
sudo apt-get install pbzip2
LRZIP - A multithreaded compression program that can achieve very high compression ratios and speed when used with large files. It uses the combined compression algorithms of zpaq and lzma for maximum compression, lzo for maximum speed, and the long range redundancy reduction of rzip. It is designed to scale with increases with RAM size, improving compression further. A choice of either size or speed optimizations allows for either better compression than even lzma can provide, or better speed than gzip, but with bzip2 sized compression levels.
sudo apt-get install lrzip
A small Compression Benchmark (Using the test Oli created):
ORIGINAL FILE SIZE - 100 MB
PBZIP2 - 101 MB (1% Bigger)
PXZ - 101 MB (1% Bigger)
PLZIP - 102 MB (1% Bigger)
LRZIP - 101 MB (1% Bigger)
PIGZ - 101 MB (1% Bigger)
A small Compression Benchmark (Using a Text file):
ORIGINAL FILE SIZE - 70 KB Text File
PBZIP2 - 16.1 KB (23%)
PXZ - 15.4 KB (22%)
PLZIP - 15.5 KB (22.1%)
LRZIP - 15.3 KB (21.8%)
PIGZ - 17.4 KB (24.8%)
There are two main tools. lbzip2
and pbzip2
. They're essentially different implementations of bzip2 compressors. I've compared them (the output is a tidied up version but you should be able to run the commands)
cd /dev/shm # we do all of this in RAM!
dd if=/dev/urandom of=bigfile bs=1024 count=102400
$ lbzip2 -zk bigfile
Time: 0m3.596s
Size: 105335428
$ pbzip2 -zk bigfile
Time: 0m5.738s6
Size: 10532460
lbzip2
appears to be the winner on random data. It's slightly less compressed but much quicker. YMMV.
Update:
XZ Utils supports multi-threaded compression since v5.2.0, it was originally mistakenly documented as being multi-threaded decompression.
For example: tar -cf - source | xz --threads=0 > destination.tar.xz