Compress and then encrypt, or vice-versa?

I am writing a VPN system which encrypts (AES256) its traffic across the net (Why write my own when there are 1,000,001 others already out there? Well, mine is a special one for a specific task that none of the others fit).

Basically I want to run my thinking past you to make sure I'm doing this in the right order.

At the moment packets are just encrypted before being sent out, but I want to add some level of compression to them to optimize the tranfer of data a little. Not heavy compression - I don't want to max out the CPU all the time, but I want to make sure the compression is going to be as efficient as possible.

So, my thinking is, I should compress the packets before encrypting as an unencrypted packet will compress better than an encrypted one? Or the other way around?

I will probably be using zlib for the compression.

Read more on the Super User blog.


Solution 1:

If the encryption is done properly then the result is basically random data. Most compression schemes work by finding patterns in your data that can be in some way factored out, and thanks to the encryption now there are none; the data is completely incompressible.

Compress before you encrypt.

Solution 2:

Compress before encryption. Compressed data can vary considerably for small changes in the source data, therefore making it very difficult to perform differential cryptanalysis.

Also, as Mr.Alpha points out, if you encrypt first, the result is very difficult to compress.

Solution 3:

Even if it depends on the specific use-case, I would advise Encrypt-then-Compress. Otherwise an attacker could leak information from the number of encrypted blocks.

We assume a user sending a message to the server and an attacker with the possibility to append text to the user message before sending (via javascript e.g.). The user wants to send some sensible data to the server and the attacker wants to get this data. So he can try to append different messages to the data the user sends to the server. Then the user compresses his message and the appended text from the attacker. We assume a DEFLATE LZ77 compression, so the function replaces same information with a pointer to first appearance. So if the attacker can reproduce the hole plaintext, the compression-function reduces the size of the plain text to the original size and a pointer. And after the encryption, the attacker can count the number of cipher blocks, so he can see, if his appended data were the same as the data the user sent to the server. Even if this case sounds a little bit constructed, it is a serious security issue in TLS. This idea is used by an attack called CRIME to leak cookies in a TLS connection to steal sessions.

source: http://www.ekoparty.org/archive/2012/CRIME_ekoparty2012.pdf