Tar extract a stream of tar files
I've got a binary (lets call it displayFiles
) which will get several tar files at once and print the tars to stdout. I then pipe the stdout to tar
.
This works fine when the binary only downloads a single tar. If I get several tars at once and pipe them to tar
is it likely to work? The displayFiles
program will not pause between one file and the next.
E.g:
./displayFiles | { tar -xvf -; }
Solution 1:
The end of a tar archive is marked by two consecutive zero-filled 512-byte records. When reading from stdin, tar
should stop after reading the two zero-filled records, so data that follows (if any) can be read by the next tool.
If the data that follows is another tar archive and the next tool is another tar
then the tool will work. To extract N concatenated archives you need to call tar
N times. Unless…
GNU tar
supports --ignore-zeros
, so a single tar
can extract some concatenated archives. Not all though.
Normally, tar stops reading when it encounters a block of zeros between file entries (which usually indicates the end of the archive).
--ignore-zeros
(-i
) allows tar to completely read an archive which contains a block of zeros before the end (i.e., a damaged archive, or one that was created by concatenating several archives together).The
--ignore-zeros
(-i
) option is turned off by default because many versions of tar write garbage after the end-of-archive entry, since that part of the media is never supposed to be read. […]
(source)
GNU tar
with --ignore-zeros
(-i
) is the first thing to try:
./displayFiles | tar -xivf -
If you cannot use GNU tar
nor any implementation that supports something like --ignore-zeros
, then you need to call tar
N times. If N is not known in advance, run tar
in a loop until it fails:
./displayFiles | while tar -xvf -; do :; done
In the best case expect This does not look like a tar archive
from tar
that tries to read after displayFiles
closes its stdout.
Note a tar archive with extra data (or garbage) after two consecutive zero-filled 512-byte records is still valid (i.e. tar
will extract it just fine). If such archive gets to our loop then the next tar
will read the extra data. Except one or few edge cases the extra data will make the tool fail and this will end the loop early. But even if we continued the loop, except one or few edge cases the extra data would "desynchronize" the stream and each following tar
would start reading the stream not where some concatenated archive begins. Failing early is therefore not bad.
Hopefully tar archives coming from displayFiles
don't contain extra data. If they do then there is no easy and reliable way to find individual archives in a stream of concatenated archives. The problem exists no matter if you use --ignore-zeros
or the loop.
If you use the loop, one way or another some tar
in it will fail and in general you won't know if all the data was processed. For this reason consider adding some command that will tell you if the stream was depleted. Example:
./displayFiles | (while tar -xvf -; do :; done; exit "$(head -c 1 | wc -c)")
(head -c 1
is not portable. Portable replacement is dd bs=1 count=1 2>/dev/null
.)
Exit status 0
means all the data was consumed by tar
process(es). It means nothing more (in particular it doesn't mean there were no meaningful errors).