Zip deflate percentage

using zip archive file I got:

adding: file (deflated 40%)

-rw-rw-r-- 1 lenduya lenduya 757 Jan 18 16:26 archive.zip
-rw-rw-r-- 1 lenduya lenduya 973 Jan 18 16:25 file

The question is I'm not sure how it got the 40%. 973/757 is 1.28 and 757/973 is 0.77. 757/(973-757) is 3.5 and 973/(973/757) is 4.5.

Bonus: The objective is that the argument of a script is a file. I have to print to output the compression ratio of zip. My thinking process was that I would use the summary of zip, output it to a file and from there I would filter out the desired information using cut or tr. bc would deal with the float formatting. Am I on a good path or is there a much simpler way?


Solution 1:

First question: the 40%. That is how much "space" was removed from the source file when it was compressed. The size of the .zip file includes overhead, such as CRC values, internal file index, etc. The smaller the source file, the larger the relative ratio of space used for overhead is.

To find the compressed size of the file, without overhead, use unzip and list the contents

unzip -v archive.zip

Your example probably used ~173 bytes or ~23% of the archive for overhead. Doing a file here of 18K used about the same overhead 162 bytes or ~0.2% of the zip file size.

The math for your case is: compressed size ~584 bytes, space saved 973-584=389 bytes, compression ratio 584/973=60%, or deflation ratio 389/973=40%, overhead 757-584=173 and 173/757=23%.

Bonus section: Output.

You can read that output and parse it if you wish. The deflated percent is going to be as close as it can be without decimal places. If you process one file, that's not to bad. If you process several in one archive, that can be interesting, though still possible. Better might be to use the unzip command above. If you run it on your archive you'll see that it lists the file's size and compressed size twice. The second time is a summary for the archive, which is one file in this case. If you have multiple files, the the summary is the combined total of space saved, as a percentage of the original file sizes.

Since you are a student, I'll leave the parsing work to your imagination as an exercise to hone your skill.

Luck.