Why are not all files compressed and how to improve the solution
I have a folder with about 20K files. The files are named according to the pattern xy_{\d1,5}_{\d4}\.abc
, e.g xy_12345_1234.abc
. I wanted to compress the first 10K of them using this command:
ls | sort -n -k1.4,1.9 | head -n10000 | xargs tar -czf xy_0_10000.tar.gz
however the resulting file had only about 2K files inside.
ls | sort -n -k1.4,1.9 | head -n10000 | wc -l
however returns 10000, as expected.
It seems to me that I am misunderstanding something basic here...
I am using zsh 5.0.2 on Linux Mint 17.1, GNU tar 1.27.1
EDIT:
forking as suggested by @Archemar sounds very plausible, with the latest fork overwriting the resulting file - the file contains the 'tail' of the files - 7773 to 9999.
result of xargs --show-limit
:
Your environment variables take up 3973 bytes
POSIX upper limit on argument length (this system): 2091131
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2087158
Size of command buffer we are actually using: 131072
replacing -c
with -r
or -u
did not work in my case. The error message was tar: Cannot update compressed archives
using both -r
and -u
is invalid and fails with tar: You may not specify more than one '-Acdtrux', '--delete' or '--test-label' option
replacing -c
with -a
seems to be invalid as well and fails with the same tar: You must specify one of the '-Acdtrux', '--delete' or '--test-label' options
though I dont recognize the issue azf
and Acdtrux
seem disjunct to me.
EDIT 2:
-T looks like a good way, I have also found an example here.
However when I try
ls | sort -n -k1.4,1.9 | head -n10000 | tar -czf xy_0_10000.tar.gz -T -
i get
tar: option requires an argument -- 'T'
well, perhaps the filenames dont reach tar? But it looks like they, do because when I execute
ls | sort -n -k1.4,1.9 | head -n10000 | tar --null -czf xy_0_10000.tar.gz -T -
i get
tar: xy_0_.ab\nxy_1_...<the rest of filenames separated by literal \n>...998.ab
Cannot stat: File name too long
So why is tar not seeing the filenames?
you've hit xargs limit ?
xargs --show-limit
try :
- create a dummy
.tgz
filetar czf xy_0_10000.tar.gz /hello/world
- replace
-czf
by-Azf
when xarg hit its limit, it will fork command, so command you ultimatly ran was
tar czf xy_0_10000.tar.gz file1 file2 .... file666
tar czf xy_0_10000.tar.gz file667 file668 ... file1203
tar czf xy_0_10000.tar.gz file1024 ... file2000
as each tar overide previous one, you sould be getting only last tar c
run.
Edit:
1) according to append is done by (either) man tar
on unbuntu, -a
and -r seems equivalent-A, --catenate, --concatenate
2) zip
(not gzip
) can be used to add file, maybe a gzip option will do the trick. (use | xargs zip -qr xy_0_0000.zip
, this will result in a zip file, not a .tar.gz however)
3) to use @rsanchez's solution
It is important to add option to tar in a proper way, try
ls | sort -n -k1.4,1.9 | head -n10000 |tar -czf xy_0_10000.tar.gz -T -
where
- -T -
mean use option -T
and use -
as argument to -T
(you could have generate a list of file in /tmp/foo.lst
, then use -T /tmp/foo.lst
)
There's no need for xargs
. If you directly give tar
the -T -
option it will read the filenames from standard input.
For instance:
... | tar -T - -czf xy_0_10000.tar.gz
I want to complement the two other answers with a zsh solution, which neither parses ls, nor needs xargs. However, I am not sure right now, if it suffers also from the limitation of the command line length.
-
Define a function which generates your desired sorting key by modifying
$REPLY
.sortkey() { REPLY=${REPLY[4,9]} }
This is equivalent to your
sort -n -k1.4,1.9
-
Generate an array
$files
with the filenames sorted with the above function:files=(*(o+sortkey))
This is equivalent to
ls | sort -n -k1.4,1.9
-
Return the first 10 000 files with
${files[0,9999]}
This is equivalent to
ls | sort -n -k1.4,1.9 | head -n10000
So, all in all this should do the trick:
sortkey() { REPLY=${REPLY[4,9]} }
files=(*(o+sortkey))
tar -czf xy_0_10000.tar.gz ${files[0,9999]}