Content-Length not sent when gzip compression enabled in Apache?

I would really appreciate some help understanding this Apache behaviour.

I am communicating to PHP from an iPhone Objective-C app in application/json. Gzip compression is enabled on the server, and requested by the client.

From my .htaccess:

AddOutputFilterByType DEFLATE text/html text/plain text/xml application/x-httpd-php application/json

For small requests, Apache is setting the 'Content-Length' header. For example (these values are output in Objective-C from the header):

Connection = "Keep-Alive";
"Content-Encoding" = gzip;
"Content-Length" = 185;     <-------------
"Content-Type" = "application/json";
Date = "Wed, 22 Sep 2010 12:20:27 GMT";
"Keep-Alive" = "timeout=3, max=149";
Server = Apache;
Vary = "Accept-Encoding";
"X-Powered-By" = "PHP/5.2.13";
"X-Uncompressed-Content-Length" = 217;

X-Uncompressed-Content-Length is a header I am adding set to the size of the uncompressed JSON string.

As you can see, this request is very small (217 bytes).

Here's the headers from a larger request (282888 bytes):

Connection = "Keep-Alive";
"Content-Encoding" = gzip;
"Content-Type" = "application/json";
Date = "Wed, 22 Sep 2010 12:20:29 GMT";
"Keep-Alive" = "timeout=3, max=148";
Server = Apache;
"Transfer-Encoding" = Identity;
Vary = "Accept-Encoding";
"X-Powered-By" = "PHP/5.2.13";
"X-Uncompressed-Content-Length" = 282888;

Notice that Content-Length is not given.

My questions:

  1. Why doesn't Apache send the Content-Length for the larger request?
  2. Does the fact that 'Contend-Encoding=gzip' is set mean that gzip compression is still working on the larger request, even though I can't verify the size difference?
  3. Is there a way I can get Apache to include the actual Content-Length for these larger requests to more accurately report the data usage to the users?

This app can be used on data plans that are expensive, hence my desire to report the actual usage to the user, not 30-70% inflated usage (a few hundred extra KB may not sound like much – but these plans can cost between $1 and $10 per MB!).

Thanks in advance.


Solution 1:

Addition to Martin Fjordvalds answer:

Apache uses chunked encoding only if the compressed file size is larger than the DeflateBufferSize. Increasing this buffer size will therefore prevent the server using chunked encoding also for larger files, causing the Content-Length to be sent even for zipped data.

More Information is available here: http://httpd.apache.org/docs/2.2/mod/mod_deflate.html#deflatebuffersize

Solution 2:

Sounds like Apache is doing chunked encoding, this means it can send the data as it's being gzipped rather than waiting for the full response to be gzipped. It's fairly standard practice, I'm not familiar enough with Apache to say if it can be disabled, though.

Solution 3:

OK, I managed to solve this. As Martin F correctly points out, Apache is chunking the reply so the content size is not known. For many people this is desirable (page loads faster). This comes at a cost of not being able to report the download progress.

For those like me who really want to report the download progress, if you use Apache or PHP's automatic gzip support, there is little you can do. The solution is to do it manually. It's easier than it sounds:

If you're sending whole files, then this is a great example in PHP to force a single chunk (with the Content-Length): http://www.php.net/manual/en/function.ob-start.php#94741

If you're sending generated data, then use gzencode to encode your data, like in the above sample. A pre-requisite is that all your output data is stored in a variable (you can use ob_start to help this if you need to buffer, then get contents of buffer).

        // $replyBody is the entire contents of your reply

        header("Content-Type: application/json");  // or whatever yours is

        // checks if gzip is supported by client
        $pack = true;
        if(empty($_SERVER["HTTP_ACCEPT_ENCODING"]) || strpos($_SERVER["HTTP_ACCEPT_ENCODING"], 'gzip') === false)
        {
            $pack = false;
        }

        // if supported, gzips data
        if($pack) {
            header("Content-Encoding: gzip");
            $replyBody = gzencode($replyBody, 9, FORCE_GZIP);
        }

        // compressed or not, sets the Content-Length           
        header("Content-Length: " . mb_strlen($replyBody, 'latin1'));

        // outputs reply & exits
        echo $replyBody;
        exit;

And voila!

Another great benefit of doing it yourself is that you can set the compression level. This is great for my mobile application, as I can set to the highest compression level (so my users pay less for data!) – whereas the server probably only uses a medium compression level for a better CPU/size trade-off. Compression levels are something I believe you can only change if you can edit the httpd.conf (which on shared hosting, I can't).

So I've kept my DEFLATE .htaccess directive for everything but my application/json replies which I now encode in the above way.

Thanks again Martin F, you gave me the spark I needed to solve this :)