Compressing a file in place - does "gzip -c file | dd of=file" really work?

Experiment shows that this does not work.

I created a 2-megabyte file from /dev/urandom, then tried the above command on it. Here are the results:

% ls -l
total 41008
-rw-r--r-- 1 kst kst 20971520 2012-01-18 03:47 file
-rw-r--r-- 1 kst kst 20971520 2012-01-18 02:48 orig
% gzip -c file | dd of=file
0+1 records in
0+1 records out
25 bytes (25 B) copied, 0.000118005 s, 212 kB/s
% ls -l
total 20508
-rw-r--r-- 1 kst kst       25 2012-01-18 03:47 file
-rw-r--r-- 1 kst kst 20971520 2012-01-18 02:48 orig
$

Obviously a 2-megabyte random file won't compress to 25 bytes, and in fact running gunzip on the compressed file yields an empty file.

I got similar results for a much smaller random file (100 bytes).

So what happened?

In this case, the dd command truncated file to zero bytes before starting to write to it; gzip started reading from the newly empty file and produced 25 bytes of output, which dd then appended to the empty file. (An empty file "compresses" to a non-zero size; it's theoretically impossible for any compressor to make all input smaller).

Other results may be possible, depending on the timing of the gzip, dd, and shell processes, all of which are running in parallel.

There's a race condition because one process, gzip, reads from file, while another parallel process, the shell, writes to it.

It should be possible to implement an in-place file compressor that reads and writes to the same file, using whatever internal buffering is necessary to avoid clobbering data. But I've never heard of anyone actually implementing that, probably because it usually isn't necessary and because if the compressor fails partway through, the file will be permanently corrupted.

Compressing a file in place - does "gzip -c file | dd of=file" really work?

Related

Recent Posts