How make a compressed tar when there are too many filenames for the shell to expand on a single line?

Normally I would just do something like:

tar -czf archive.tar.gz *.csv

But when there are too many files in the directory for the shell to expand on a single line this doesn't work.

In these cases I would normally resort to using find. Something like:

find /path -name '*.csv' -exec tar -rf "./archive.tar.gz" {} +;`

But this only seems to work if I don't include the -z option because you can't append to compressed archives, and using -c instead of -r will overwrite the first archive since find runs tar multiple times.

The only other solution I could come up with is to create a .tar file with find (as above) and then use a second command to compress it. Is there a better way to handle cases like this?

I'm using Ubuntu Linux.


Solution 1:

As a robust solution, use find to separate filenames by a null character, and then pipe directly to tar, which reads null-delimited input:

find . -maxdepth 1 -name '*.csv' -print0 |
tar -czf archive.tgz --null -T -

This will now handle all file names correctly and is not limited by the number of files either.

Using ls to generate a list of filenames to be parsed by another program is a common antipattern that should be avoided whenever possible. find can generate null-delimited output (-print0) that most utilities can read or parse further. Since the null character is the only character that cannot appear in a filename (and the /, obviously), you'll always be safe with that.

Solution 2:

No, you cannot append to a compressed tar file without uncompressing it first.

However, tar can accept its list of files to process from a file, so you can just do:

ls *.csv > temp.txt
tar -zcf ball.tgz -T temp.txt

@slhck points out that the above solution will not work if there are spaces (and probably other annoying characters) in your filenames. This version encloses each filename in double quotes:

ls *.csv | sed -e 's/^\(.*\)$/"\1"/' > temp.txt
tar -zcf ball.tgz -T temp.txt

(This will of course break if you have double quotes in your filenames, in which case you get what you deserve. :)