Concatenate multiple tar files in one command

Solution 1:

This may not help you, but if you are willing to use the -i option when extracting from the final archive, then you can simply cat the tars together. A tar file ends with a header full of nulls and more null padding till the end of the record. With --concatenate tar must go through all the headers to find the exact position of the final header, in order to start overwriting there.

If you just cat the tars, you just have extra nulls between headers. The -i option asks tar to ignore these nulls between headers. So you can

cat  receiverTar1.tar receivedTar2.tar ... >>alltars.tar
tar -itvf alltars.tar

Also, your tar --concatenate example ought to be working. However, if you have the same named file in several tar archives you will rewrite that file several times when you extract all from the resulting tar.

Solution 2:

This question is rather old but I wish it had been easier for myself to find the following information sooner. So if anyone else stumbles across this, enjoy:

What Jeff describes above is a known bug in gnu tar (reported in August 2008). Only the first archive (the one after the -f option) gets its EOF marker removed. If you try to concatenate more than 2 archives the last archive(s) will be "hidden" behind file-end-markers.

It is a bug in tar. It concatenates entire archives, including trailing zero blocks, so by default reading the resulting archive stops after the first concatenation.

Source: https://lists.gnu.org/archive/html/bug-tar/2008-08/msg00002.html (and following messages)

Considering the age of the bug I wonder if it will ever get fixed. I doubt there is a critical mass that is affected.

The best way to circumvent this bug could be to use the -i option, at least for .tar files on your file system.

As Jeff points out tar --concatenate can take a long time to reach the EOF before it concatenates the next archive. So if you're going to be stuck with a "broken" archive that needs the tar -i option to untar, I suggest the following:

Instead of using tar --concatenate -f archive1.tar archive2.tar archive3.tar you will likely be better off to run cat archive2.tar archive3.tar >> archive1.tar or pipe to dd if you intend to write to a tape device. Also note that this could lead to unexpected behaviour if the tapes did not get zeroed before (over)writing new data onto them. For that reason the approach I am going to take in my application is nested tars as suggested in the comments below the question.

The above suggestion is based on the following very small sample benchmark:

time tar --concatenate -vf buffer.100025.tar buffer.100026.tar
  real  65m33.524s
  user  0m7.324s
  sys   2m50.399s

time cat buffer.100027.tar >> buffer.100028.tar
  real  46m34.101s
  user  0m0.853s
  sys   1m46.133s

The buffer.*.tar files are all 100GB in size, the system was pretty much idle except for each of the calls. The time difference is significant enough that I personally consider this benchmark valid despite small sample size, but you are free to your own judgement on this and probably best off to run a benchmark like this on your own hardware.