What settings to use when making 7zip files in order to get maximum compression when compressing PDFs?
What settings to use when making 7zip files in order to get maximum compression? I'm compressing PDF documents containing scanned images. I'm thinking about using LZMA2, but I don't know what to set in dictionary size, word size, etc. Also, would LZMA or PPMd be better options?
I need to have some files transfered (~200MiB) over net and upload speeds here are very slow, so I'd want to compress the data as much as possible. CPU time consumed is not very important.
EDIT
Here's what I got after testing various compression methods:
Uncompressed size was: 25,462,686B
My processor is Intel Core 2 Due T8100 and I have 4GiB of ram.
Best compression was with PeaZip using PAQ8O algorithm. Resulting file size was 19,994,325B. Settings used were compression level: maximum. Unfortunately, speed of compression was around 5KiB/s, so it took more that one hour to compress data.
Next was experimental PAQ9O compressor. Using it, I got 20,132,660B in about 3 minutes of compression. Unfortunately, program is command line only, and not many other programs use that compression algorithm. It also uses around 1.5GiB of RAM with settings I used (a -9 -c)
After that was 7-Zip 9.15 beta (2010-06-20) using LZMA2. Using it, I got 20,518,802B in about 3 minutes. Settings used were word size 273, dictionary size 64MB and I used 2 threads for compression.
Now back to my original question: In my case solid block size didn't produce any noticeable results. Increasing word size did produce some results. Difference between highest word size and smallest was 115,260B. I believe that such savings do justify efforts needed to make two necessary clicks and change word size.
I tried using other compression algorithms supported by 7zip and PeaZip and they produces files in sizes from 19.8MiB to 21.5MiB.
In the end my conclusion is that when compressing PDF documents containing mostly images, the effort needed to use exotic compression algorithms isn't justified. Compression using LZMA2 in 7zip produced quite acceptable results in least amount of time.
Solution 1:
The content of the PDFs (text & images) is probably already compressed -- so there's not going to be much to gain by trying to compress them again.
Solution 2:
Try precomp - it first decompresses the already compressed data inside of your PDFs. Then 7z can do its magic on uncompressed data.
Also try nanozip which I have verified to be very effective, yet very efficient (400kb/s at compression ratios of PAQ algorithms).
Solution 3:
7za a -t7z -mx-9 -mfb=258 -mpass=15 filename.7z subdir
Adjust the first word as necessary for the name of your command line executable, and adjust the parts after "-mpass=15" to customize your filename and what it should include.
This answer is not specific to PDF documents.
This uses LZMA, not PPM. I've stayed away from PPM because there are too many variations that are not compatible with other variations. LZMA looks to be more stable, with compatibility being more widely supported. So I've stayed away from PPM precisely because my opinion was, as you've stated, "the effort needed to use exotic compression algorithms isn't justified."