How can I compress a file on Linux in-place, without using additional disk space?

Solution 1:

This is a proof of concept bash one-liner, but it should get you started. Use at your own risk.

truncate -s `gzip -c file | dd of=file conv=notrunc 2>&1 | sed -n '$ s/ .*$// p'` file
mv file file.gz

This works by piping gz data to a dd process that writes it back to the same file. Upon completion, the file is truncated to the size of the gz output.

This assumes that the last line of dd's output matches:

4307 bytes (4.3 kB) copied, 2.5855e-05 s, 167 MB/s

Where the first field is an integer of bytes written. This is the size the file will need to be truncated to. I'm not 100% sure that the output format is always the same.

Solution 2:

It's not so much that gzip and bzip2 overwrite the original. Rather, they write the compressed data to disk as a new file, and if that operation succeeds, they unlink the original uncompressed file.

If you have sufficient RAM, you could write a script to temporarily compress the files in atmpfs filesystem, then remove the original on disk and replace it with the compressed version. Maybe something like this:

# some distributions mount /dev/shm as tmpfs; replace with bzip2 if you prefer
if gzip -q9c /full/disk/somefile > /dev/shm/somefile.gz
then
    rm -f /full/disk/somefile && mv -i /dev/shm/somefile.gz /full/disk
fi

Just be mindful of your memory usage, since tmpfs is essentially a RAM disk. A large output file could easily starve the system and cause other problems for you.

Solution 3:

There is no tool that works this way, for precisely the reason you give. Few people are willing to write a tool that deliberately implements risky behavior.

Solution 4:

The split and csplit commands could be used to split the large file up into smaller parts, and then compress them individualy. Reassembling would be rather time consuming though.