Use tar vs bzip2 for creating a .tar.bz2 archive?

I just read that tar can create .tar.bz2 files. All this while, I was using tar + bzip2 to do this.

I was wondering if there was a difference between using tar to create the archive vs using bzip2?

Also, why have 2 things which does the same thing?

Thanks!


GNU tar supports the --bzip2 option, which lets you stream the tar file through bzip2 before writing the resultant file out to disk. All tar compression supported by command-line options operates on chunks of streamed data, which doesn't matter much in the average case, but external compression may be able to offer better compression even with the same algorithm.

For example, using --bzip2 uses a pre-defined compression level, while running bzip2 on an uncompressed tarball gives you the opportunity to tune various compression parameters and perhaps achieve tighter compression. In addition, operations that can use the entire tarball for input might be able to take advantage of compression opportunities beyond the standard-sized chunks available through streaming. A good example would be lrzip, which can take advantage of redundancies throughout the entire tar archive, rather than just redundancies within the current data chunk.

Unless space is at an extreme premium, you're generally better off using the built-in compression options for simplicity's sake. The built-in options provide a high level of convenience and a reasonable trade-off between speed and compression. However, your mileage may definitely vary.


I was wondering if there was a difference between using tar to create the archive vs using bzip2?

Bzip2 is a compression tool ... and a very good one.

tar is an Archiving (storage) tool ... perhaps the most successful and robust one.

Also, why have 2 things which does the same thing?

Well tar predates bzip and I think even the original zip.

They only do the "same thing" in some limited cases.

Perhaps your not fully understanding what tar is about ?

Tar is for archiving (Tape Archiver) it does way more than compress a glob of files.

<quote man page>
DESCRIPTION
     Tar stores and extracts files from a tape or disk archive.
 </quote>

So you can add ,extract, subtract, update the contents of the archive. (without unpacking it in many cases)

Its a classic tool that evolved in the far distant heroic past, when 64K was a lot of memory and streaming (character by character) based storage was a norm. (like magnetic tapes).

tar can do things like instruct some (nominally human) operator to change tapes or spool forward/back to the correct range and location on many meters "tape".

All of this functionality is still useful, even if the actual hardware has long gone, the concepts are still used. Especially in backup world.

Generally it knows about the structure and state of an archived file system and can alter it or read from it more or less like a real live file system. (which of course it is but compressed, recorded/registered and squirreled away someplace)

rsync has some (much) of it's functionality but with clever algorithms around diffs.

see also pax.

Bzip is a brilliant tool and in most cases can compress more heavily than zip/gzip.

It efficient and quite robust, as a general purpose compression tool. It focus is compression.

Over time compressed files have come to be called archives, as it something most of them have in common.

Oh and finally tar does have native basic (unrollable) compression as well. You use the -z or -j switch to choose to use that particular compression algorithm instead. (hmm or perhaps as well ?)

hit the man pages on gzip tar bzip2 pax rsync :)

just scrolling thorough the tar docs should give you the idea.