Best compression method?
I want to compress a 16GB folder, but what's the best method? tar.gz? tar.bz2 rar? 7z? Would the archive be smaller if I first compressed in a method, then copy the compressed archive to a new folder, then re-compressed in some other method? I need to make it fit on a DVD (output maybe 8.5GB, don't remember) but putting "4370 MB" makes the compressed file be 2.5GB part.
BTW, what's the default compression method on Ubuntu?
Solution 1:
The default is gz
. The best results I get with 7z
though.
Here is the results for a 1.4 Gb virtualbox container:
Best compression – size in MB:
7z 493 rar 523 bz2 592 lzh 607 gz 614 Z 614 zip 614 .arj 615 lzo 737 zoo 890
Source
Install
sudo apt-get install p7zip-full
Solution 2:
This question is very old, but perhaps somebody finds this solution useful:
Use rzip
, after tar
. It first compresses 900 MB large data blocks using a dictionary method, and then it hands the cleaned-up data over to bzip2
. It is much faster than the other strong compression tools (bzip2
, lzma
), and some files it compresses even better than bzip2
or lzma
.
Yes, gz
is the default compression tool on Linux. It is fast, and despite its age it gives still very good results in compressing text files like source code. Another standard tool is bzip2
, though it is much slower.
Addition: lrzip is newer and extends the principle of rzip. It even supports unlimited block sizes, and a choice of compression methods (LZMA, Bzip2, Gzip, LZO, ZPAQ or none). LZMA is the standard. For backup or if you share much data with other Linux/BSD users, it can come in really handy.
Solution 3:
I opt for a LZMA
. It has smallest byte overhead and has strong compression ratio. Comparison between ZIP and LZMA:
I've generated two files seq.txt
with PHP code
$s = '0123456789'; $str = ''; for ($i=0; $i < 1000000; $i++) $str .= $s[$i%10].($i%10==9 ? "\n":""); file_put_contents('seq.txt', $str);
which holds repeating blocks of 0..9 digits ~ 1Mb of data
and rnd.txt
with PHP code
$s = '0123456789'; $str = ''; for ($i=0; $i < 1000000; $i++) $str .= $s[rand(0,9)].($i%10==9 ? "\n":""); file_put_contents('rnd.txt', $str);
which holds random blocks of 0..9 digits ~ 1Mb of data.
Compression results:
- seq.txt, rnd.txt - 1100000 bytes
- seq.txt.zip - 2502 bytes
- rnd.txt.zip - 515957 bytes
- seq.txt.lzma - 257 bytes
- rnd.txt.lzma - 484939 bytes
Compression ratio:
- ZIP -> "seq.txt" -> 99.772%
- ZIP -> "rnd.txt" -> 53.094%
- LZMA -> "seq.txt" -> 99.976%
- LZMA -> "rnd.txt" -> 55.914%
So LZMA has compressed sequential data by 0.2% more effectively than ZIP
and random data 2.8% more effectively than ZIP.
For sure LZMA wins !