Converting gzip files to bzip2 efficiently

This question was asked a long time ago when pbzip2 either wasn't available or wasn't capable of compressing from stdin, but you can now parallelize both uncompressing and compressing steps using parallel and pbzip2 (instead of bzip2):

ls *.gz | parallel "gunzip -c {} | pbzip2 -c > {.}.bz2"

which is significantly faster than using bzip2.


Rather than gunzip in one step and bzip2 in another, I wonder if it would perhaps be more efficient to use pipes. Something like gunzip --to-stdout foo.gz | bzip2 > foo.bz2

I'm thinking with two or more CPUs, this would definitely be faster. But perhaps even with only a single core. I shamefully admit to not having tried this out, though.


GNU parallel (http://www.gnu.org/software/parallel) might be an option if you have multiple cores (or even multiple machines):

ls *.gz | parallel "gunzip -c {} | bzip2 > {.}.bz2"

Read the tutorial / man page for details and options.


What you're currently doing is your best bet. There is no conversion tool available, and attempting to bzip2 an already gzipped file is not really an option, as it frequently has undesired effects. Since the algorithm is different, converting would involve retrieving the original data regardless. Unless of course gzipping was a step in the bzip2 process, in which it isn't unfortunately.


Occasionally, I need to do the same thing with log files. I start with the smallest *.gz files first (ls -rS), gunzip and then and bzip2 them individually. I do not know if it is possible to direct the gunzip output directly to the bzip2 input. The bzip2 command is so much slower at compressing than gunzip is at decompression that it may consume the memory and swap space on the host.

Improvements or suggestions are welcome. Here is my one liner:

for i in $(ls -rS *.gz | sed 's/\.gz//'); do gunzip ${i}.gz; bzip2 -9 ${i}; done