What is good compression algorithm that can be used for Oracle .DMP files
I have done some comparisons using a 4800 MB .DMP file. Resulting in:
- deflate, normal compression: 799 MB
- bzip2, normal compression: 749 MB
- PPMd, normal compression: 654 MB
Then I have done some tests on a smaller .DMP file, of 72 MB:
- deflate, normal compression: 21 MB
- bzip2, normal compression: 15,4 MB
- bzip2, ultra compression: 15,4 MB
- LZMA (aka 7zip), normal compression: 14,3 MB (dictionary: 16MB, word size: 32)
- PPMd, normal compression: 13,3 MB
- LZMA (aka 7zip), maximum compression: 12,3 MB (dictionary: 32MB, word size: 64)
- PPMd, ultra compression: 11 MB
- LZMA (aka 7zip), ultra compression: 9,1 MB (dictionary: 64MB, word size: 64)
- LZMA2 (also available for the 7z format), ultra compression: 9,1 MB
All these tests have been done with 7Zip, and the compression levels refer to the default settings as available in 7Zip.
Some conclusions and observations so far
LZMA and LZMA2 seem to be the best options for .DMP files. I have seen other people report this too in other forums. In 7zip, LZMA only supports 2 CPU's, and LZMA2 supports more (will also use more memory). Since the compression results look identical, selecting LZMA2 should result in faster compression.
Now let's play with word and dictionary size for ultra compressed LZMA2 files...
- Word size 273 / dictionary 64MB: 8,8 MB
- Word size 273 / dictionary 128MB: 8,8 MB
Final conclusions
7zip, 7z, LZMA2, or however you name it, is the way to go.
If you have the patience, going for ultra compression pays off. Going for an even higher word size also pays off, but going for a bigger dictionary size might not (in this test case).
I'd be curious to get some averages of compression ratios of people trying this. I'll try to remember posting results once I'm done compressing some bigger archives.
New conclusions
I've been fine-tuning this, and had the best results time/ratio wise with these settings:
- Compression level: ultra
- Compression method: LZMA2
- Dictionary size: 48MB (going larger has no effect at all, going smaller increases the resulting filesize significantly)
- Word size: 12 (Going up to 16 doubles the compression time, with only a minor filesize improvement. Going up to 128 tripples the compression time, again with "only" 23% better end result. Could do this is every MB counts...)
- Solid block size: solid (since we are compressing one, or just a few huge files this speeds up compression, and has no effect on the resulting filesize)
Oracle dumps files use a proprietary binary format. So you want a standard compression that is good for binary data. bzip2 is good with binary data, so I would recommend that.
HERE is a great article I used when making a backup script on some linux machines. Worked really well. I ended up using rzip for my needs. I used it to backup and compress all types of data files including SQL database dumps.
Building on the @Wouter answer I was able to acheive a good result with the following options (approximately 3 times smaller dictionary and 10 times larger word size - can't confirm its impact on the compression time):
7z a -r -ms=e4g -mqs+ -m0=LZMA2:a=0:d=18m:mf=hc4:fb=128:mc=5:lc=4:lp=0:pb=0 -bt outfile.7z @items_to_archive.txt
It gives the compression ratio of 0.12 (compressed sizes / original size) and it's 18...20% faster than -mx4
. Higher compression can be achieved with a=1
, higher values for d
, fb
and mc
at the expense of increased compression time.
For comparison, WinRar "Best" method with default options results in 0.11 compression ratio and is approximately 6x slower, "Fastest" method gives 0.17 compression and is 1.5 times faster. Rar archives are 3x faster to extract though.