Let varnish send old data from cache while it's fetching a new one?

I'm caching dynamically generated pages (PHP-FPM, NGINX) and have varnish in front of them, this works very well.

However, once the cache timeout is reached, I see this:

  • new client requests page
  • varnish recognizes the cache timeout
  • client waits
  • varnish fetches new page from backend
  • varnish delivers new page to the client (and has page cached, too, for the next request which gets it instantly)

What I would like to do is:

  • client requests page
  • varnish recognizes the timeout
  • varnish delivers old page to the client
  • varnish fetches new page from backend and puts it into the cache

In my case it's not site where outdated information is such a big problem, especially not when we're talking about cache timeout from a few minutes.

However, I don't want punish user to wait in line and rather deliver something immediate. Is that possible in some way?

To illustrate, here's a sample output of running siege 5 minutes against my server which was configured to cache for one minute:

HTTP/1.1,200,  1.97,  12710,/,1,2013-06-24 00:21:06
...
HTTP/1.1,200,  1.88,  12710,/,1,2013-06-24 00:21:20
...
HTTP/1.1,200,  1.93,  12710,/,1,2013-06-24 00:22:08
...
HTTP/1.1,200,  1.89,  12710,/,1,2013-06-24 00:22:22
...
HTTP/1.1,200,  1.94,  12710,/,1,2013-06-24 00:23:10
...
HTTP/1.1,200,  1.91,  12709,/,1,2013-06-24 00:23:23
...
HTTP/1.1,200,  1.93,  12710,/,1,2013-06-24 00:24:12
...

I left out the hundreds of requests running in 0.02 or so. But it still concerns me that there are going to be users having to wait almost 2 seconds for their raw HTML.

Can't we do any better here?

(I came across Varnish send while cache , it sounded similar but not exactly what I'm trying to do.)

Solution

The answer from Shane Madden contained the solution but I didn't realize it right away. There was another detail I didn't include in my question because I thought it wasn't relevant, but actually it is.

The CMS solution I'm currently using has a varnish database listener and thus has the capability to notify varnish to ban pages whose content has changed. It sent a PURGE request with some regex to ban certain pages.

To sum things up, there are two cases where I got unlucky users:

  1. normal varnish TTL of a page expires
  2. backend users change content, this sends a purge request to varnish

In both cases I'm having "unlucky" users. In the second case it's alleviated by the fact that backend users usually check the page after it has been changed; but not necessarily.

Nevertheless, for the second case I created a solution (yes, I realize this question started out with seeking an answer for the first case ... poorly formulated question on my part):

Instead of sending a purge request, I used Shanes suggestion and adjusted the VCL so that my varnish database listener can send a special request to fetch a page with hash_always_miss set to true.

With the current architecture I don't really have the luxury of doing a real asynchronous request, but with the help of How do I make an asynchronous GET request in PHP? I was able to craft a GET request to varnish which does not wait for the page to be loaded but is good enough to trigger varnish to fetch the page from the backend and cache it.

The net effect was that the database listener sent the request to varnish and while I was polling the specific page it was never making my requests "unlucky" but once varnish fetched the page completely from the backend (this may range from 300ms to 2s) it suddenly was there.

I yet have to find a solution how to avoid the same problems when the normal TTL runs out, but I guess the solution is also exactly like Shane suggests: using wget to trigger the hash_always_miss, I'll just have to be smart enough to get list of pages I've to refresh.


Solution 1:

The solution that I've used to solve this problem is to make sure the TTL on a page never has a chance to expire before it's refreshed - forcing an HTTP client running on one of my systems to get the slow load instead of an unlucky client request.

In my case, this involves wget on a cron, sending a special header to mark the requests and setting req.hash_always_miss based on this, forcing a new copy of the content to be fetched into the cache.

acl purge {
    "localhost";
}

sub vcl_recv {
    /* other config here */
    if (req.http.X-Varnish-Nuke == "1" && client.ip ~ purge) {
        set req.hash_always_miss = true;
    }
    /* ... */
}

For your content, this might mean setting the Varnish TTL to something like 5 minutes but having a cron'd wget configured to make a cache-refreshing request every minute.

Solution 2:

@EDIT:

Just a quick one to let you know that this feature appears to have only just been implemented in the latest version in the master branch, chances are your version may not support true stale-while-revalidate yet / the example I've posted would serve 9999/10000 requests with one poor bugger still having to wait for request to complete at the backend (Still better than nothing ;) ...


Well i'm not 100% sure why the previous comments are saying it's not working but according to: https://www.varnish-software.com/static/book/Saving_a_request.html

  • req.grace - defines how long overdue an object can be for Varnish to still consider it for grace mode.
  • beresp.grace - defines how long past the beresp.ttl-time Varnish will keep an object.
  • req.grace - is often modified in vcl_recv based on the state of the backend.

I'm currently using configuration like what the manual says and it's working fine ... Here's a snippet of my vcl file...

sub vcl_recv {
    # Cache rules above here...
    if (req.backend.healthy) {
        set req.grace = 30d;
    } else {
        set req.grace = 300d;
    }
}

sub vcl_fetch {
    # Fetch rules above here ...

    # If backend returns 500 error then boost the cache grace period...
    if (beresp.status == 500) {
        set beresp.grace = 10h;
        return (restart);
    }

    # How long carnish should keep the objects in cache..
    set beresp.grace = 1h;

    # Actual TTL of cache - If an object is older than this an update will be triggered to the backend server :)
    set beresp.ttl = 1m;
}

Note that if you want to provide a longer backend response grace period time (for 500 error like in my config) you will need to setup backend probing... Here's a copy of my backend probe..

backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = { 
        .url = "/nginx-status";
        .timeout = 500 ms; 
        .interval = 3s; 
        .window = 10;
        .threshold = 4;
    }
}