What is the practical value of redundancy in zip files?

How is redundancy implemented in zipping, and how do I benefit from it?

I'd assume it consists of something like storing the lookup tables twice so a single defect does not invalidate the whole rest of the file. Regarding the use case, possibly when storing the file on a CD that gets a slight scratch?

However, I personally never got any profit from adding redundancy to zip files, and I tend to omit them, so I was wondering if they are actually useful in practice.


Solution 1:

I believe you're talking about the LZ adaptive algorithm. It's not referred to as redundancy because of anything that's getting duplicated in the process of building the zip file. The term comes from how this method of compression works.

To illustrate, here's an example. Let's say I had a document containing the phrase:

It is what it is because that's what it is

If I wanted to make this phrase shorter through redundancy, I would first make a dictionary containing all the words that were repeated, like so

1it
2is
3what

And then I would rewrite the sentence as

12312becausethats312 

If I then want to compress it farther I can add the following to my dictionary:

312x
12y

So that it becomes

yxbecausethatsx

As you can see, the more redundancy checks you go through the greater the compression. But you're also increasing the likely hood of corruption. This is because as the dictionary grows it becomes more prone to damage and if any portion of the dictionary gets damaged the rest can't be read.