Which is more efficient - tar or zip compression? What is the difference between tar and zip?
Solution 1:
tar
only makes a single file out of multiple files, it doesn't do compression unless combined a compression program such as gzip
or bzip2
(which you can call from within tar
by using the -z
or -j
options, respectively). zip
combines both the archiving and compression in one program.
Solution 2:
tar
- Assumes you'll be reading from one end to the other - "Tape ARchive". (The age of the command shows...)
- Does not do compression, but you can compress the entire resulting stream by piping it through e.g. gzip and bzip2 (done internally with -z or -j)
- Stores unix file attributes: uid, gid, permissions (most notably executable). The default may depend on your distribution, and can be toggled with options.
zip
- Stores MSDOS attributes. (Archive, Readonly, Hidden, System)
- Compresses each file, then adds them to an archive
- Includes a file table at the end of the file
- and as a result of the former two, allows reading only the exact parts about the file you need.
The fact that zip compresses the files separately will impact compression ratios, particularly on many small similar files.
(At least this was exactly correct a decade ago.)
Solution 3:
Tar preserves much more metadata than Zip, see my comparison (it's slightly outdated):
(Click to zoom in)
Tar passes 65% of the tests, where Zip only passes 17%. I have made the test suite available on github under BSD license so you can try for yourself if you have Mac. For linux there I'm not sure if there are any metadata, so these tests may not be relevant.