Do encrypted compression containers like ZIP and 7-Zip compress or encrypt first?
The discussion of "compress and then encrypt, or vice-versa" led me to ponder the following question: many compression containers, like ZIP
, 7z
, and rar
support encrypting these containers. For example, when creating a 7z
file in 7-Zip, the program lets you enter an encryption password.
For these file types, are the files compressed and then encrypted, as recommended in the aforementioned question, or the reverse? Or, is there some way that these can compress and encrypt the data at the same time?
When I create an encrypted 7z file, I can view the filenames inside of the encrypted archive, but I cannot view the contents of those files without entering the passphrase. How is this possible? As an aside, is there any way to encrypt a 7z or similar archive such that the file names and directory structure within are not visible without using the passphrase?
I would prefer answers with definitive sources/references, not just speculation. We can all make guesses about this, but if somebody can show me documentation proving that it works one way or another, that would be ideal.
I would assume that 7-Zip and other archiving tools compress before they encrypt, for the reasons stated in the linked blog post. But I was unable to find any documentation that confirms that, nor could I immediately ascertain it from looking at the 7-Zip source code.
However, I can explain why filenames aren't encrypted. As you might be aware, the 7z format contains a header with the file information and other metadata. 7-Zip will not encrypt this header unless you explicitly enable it. You can do this by checking the Encrypt file names box at the bottom of the Encryption segment of the archive creation screen on Windows, highlighted in red below.
On Linux and other Unix-like operating systems (and presumably the command line 7-Zip tool on Windows), you can enable header encryption by adding a -mhe=on
switch to the 7z
command.
I would prefer answers with definitive sources/references, not just speculation.
Oh you can do even much better than that. You can try it for yourself and base your conclusion on logic and facts. There's really no need to speculate here.
All these programs do compress first then encrypt and that is a fact that you can easily verify by yourself.
Take compressible data, like a huge number of .txt text files (say ASCII text files).
Only compress these .txt files and look at the resulting file size.
Now compress and encrypt the .txt files using the aforementionned programmed and look at the file size.
Now encrypt first the .txt files and then try to 'compress' the encrypted file and look at the file size.
What will this experiment show? 1 & 2 will have basically the same size while 3 shall have the same size as your non-compressed data.
Because one of the guarantee made by encryption algorithms is that encrypted data will look random (if it doesn't, your encryption algorithm is broken and that is a fact too).
And you can't compress randomness.
That's even better than references: it's the "try it and see for yourself".
Fact 1: good encryption algorithms produce seemingly random data
Fact 2: random data cannot be compressed
So it's obvious that if you got a file size smaller than the total of all the files' size then compression took place before encryption.
Also, it is totally obvious that if you "compress and encrypt" a set of compressible files and do not end up with a size gain, then your "compress and encrypt" sofware is broken beyond repair and can safely be thrown away as garbage written by clueless people ; )
That's the fun thing with facts: you cannot argue with facts and you cannot be wrong when stating facts.
P.S: Don't try that with already compressed files, like, say, a set of .png files, that wouldn't work
For these file types, are the files compressed and then encrypted, as recommended in the aforementioned question, or the reverse? Or, is there some way that these can compress and encrypt the data at the same time?
My first question is why, but this is something you'd want to hit the technical docs for (either source code, patents, or the like). The idea behind the zip software is that they solve the problem and you don't have to think about it.
When I create an encrypted 7z file, I can view the filenames inside of the encrypted archive, but cannot view the contents of those files without entering the passphrase. How is this possible?
The contents of the files are encrypted, but the directory (the listing of file names, the relative locations of the encrypted file data and the file attributes) is not.
As an aside, is there any way to encrypt a 7z or similar archive such that the file names and directory structure within are not visible without using the passphrase?
Sure. Use any other file encryption software. Truecrypt, OpenSSL's various tools, etc.