How to obtain maximum compression with .tar.gz? [duplicate]
Solution 1:
Or, you can tell tar to user maximum compression this way:
export GZIP=-9
tar cvzf file.tar.gz /path/to/directory
Additionally, to keep your envvars clutter-free, you can do this:
env GZIP=-9 tar cvzf file.tar.gz /path/to/directory
Solution 2:
As you stated- "tar can also compress", implies that - tar
does not always compress data by itself. It does so only when used with the z
option. That too not by itself, but by passing the tarred data through gzip.
However instead, as noted in this answer, you can pipe the two commands: tar
& gzip
such that you can explicitly specify compression level for the gzip
command to achieve the smallest output size.
tar cvf - /path/to/directory | gzip -9 - > file.tar.gz
Here 9
specifies maximum possible compression level.
Solution 3:
Usually neither gzip nor tar can create "the absolute smallest tar.gz". There are many compression utilities that can compress to the gz format. I have written a bash script "gz99" to try gzip
, 7z
and advdef
to get the smallest file. To use this to create the smallest possible file run:
tar c path/to/data | gz99 file.gz
The advdef
utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99
utility checks that it hasn't corrupted the file before accepting the output of advdef
). To use advdef
directly, create file.tar.gz however you feel like. Then run:
advdef -z -4 file.tar.gz
This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.
Since you only recently learnt that tar can compress, and didn't say why you wanted the the smallest ".tar.gz" file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn't as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn't matter to you, and you really want the smallest tar file, try:
tar cv path/to/data | xz -9 > file.tar.xz
Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:
tar xvf file.tar.xz
To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:
utility cpu format size(bytes)
gzip -9 0.02s gz 105,628
advdef -2 0.07s gz 102,619
7z -mx=9 -tgzip 0.42s gz 102,297
advdef -3 0.55s gz 102,290
advdef -4 0.75s gz 101,956
xz -9 0.03s xz 91,064
xz -3e 0.15s xz 90,996
In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.
Solution 4:
tar c /path/to/data | gzip --best > file.tar.gz
gzip
option --best
(equivalent to -9
) asks for the highest compression level.