Tar extract a stream of tar files

I've got a binary (lets call it displayFiles) which will get several tar files at once and print the tars to stdout. I then pipe the stdout to tar.

This works fine when the binary only downloads a single tar. If I get several tars at once and pipe them to tar is it likely to work? The displayFiles program will not pause between one file and the next.

E.g:

./displayFiles | { tar -xvf -; }

Solution 1:

The end of a tar archive is marked by two consecutive zero-filled 512-byte records. When reading from stdin, tar should stop after reading the two zero-filled records, so data that follows (if any) can be read by the next tool.

If the data that follows is another tar archive and the next tool is another tar then the tool will work. To extract N concatenated archives you need to call tar N times. Unless…

GNU tar supports --ignore-zeros, so a single tar can extract some concatenated archives. Not all though.

Normally, tar stops reading when it encounters a block of zeros between file entries (which usually indicates the end of the archive). --ignore-zeros (-i) allows tar to completely read an archive which contains a block of zeros before the end (i.e., a damaged archive, or one that was created by concatenating several archives together).

The --ignore-zeros (-i) option is turned off by default because many versions of tar write garbage after the end-of-archive entry, since that part of the media is never supposed to be read. […]

(source)

GNU tar with --ignore-zeros (-i) is the first thing to try:

./displayFiles | tar -xivf -

If you cannot use GNU tar nor any implementation that supports something like --ignore-zeros, then you need to call tar N times. If N is not known in advance, run tar in a loop until it fails:

./displayFiles | while tar -xvf -; do :; done

In the best case expect This does not look like a tar archive from tar that tries to read after displayFiles closes its stdout.

Note a tar archive with extra data (or garbage) after two consecutive zero-filled 512-byte records is still valid (i.e. tar will extract it just fine). If such archive gets to our loop then the next tar will read the extra data. Except one or few edge cases the extra data will make the tool fail and this will end the loop early. But even if we continued the loop, except one or few edge cases the extra data would "desynchronize" the stream and each following tar would start reading the stream not where some concatenated archive begins. Failing early is therefore not bad.

Hopefully tar archives coming from displayFiles don't contain extra data. If they do then there is no easy and reliable way to find individual archives in a stream of concatenated archives. The problem exists no matter if you use --ignore-zeros or the loop.

If you use the loop, one way or another some tar in it will fail and in general you won't know if all the data was processed. For this reason consider adding some command that will tell you if the stream was depleted. Example:

./displayFiles | (while tar -xvf -; do :; done; exit "$(head -c 1 | wc -c)")

(head -c 1 is not portable. Portable replacement is dd bs=1 count=1 2>/dev/null.)

Exit status 0 means all the data was consumed by tar process(es). It means nothing more (in particular it doesn't mean there were no meaningful errors).