Why cache static files with Varnish, why not pass

There are a few advantages to Varnish. The first one you note is reducing load on a backend server. Typically by caching content that is generated dynamically but changes rarely (compared to how frequently it is accessed). Taking your Wordpress example, most pages presumably do not change very often, and there are some plugins that exist to invalidate a varnish cache when the page changes (i.e. new post, edit, comment, etc). Therefore, you cache indefinitely, and invalidate on change - which results in the minimum load to your backend server.

The linked article not-withstanding, most people would suggest that Varnish performs better than Nginx if setup properly - although, (and I really hate to admit it) - my own tests seem to concur that nginx can serve a static file faster than varnish (luckily, I don't use varnish for that purpose). I think that the problem is that if you end up using Varnish, you have added an extra layer to your setup. Passing through that extra layer to the backend server will always be slower than just serving directly from the backend - and this is why allowing Varnish to cache may be faster - you save a step. The other advantage is on the disk-io front. If you setup varnish to use malloc, you don't hit the disk at all, which leaves it available for other processes (and would usually speed things up).

I think that one would need a better benchmark to really gauge the performance. Repeatedly requesting the same, single file, triggers file system caches which begin to shift the focus away from the web-servers themselves. A better benchmark would use siege with a few thousand random static files (possibly even from your server logs) to simulate realistic traffic. Arguably though, as you mentioned, it has become increasingly common to offload static content to a CDN, which means that Varnish probably won't be serving it to begin with (you mention S3).

In a real-world scenario, you would likely prioritize your memory usage - dynamic content first, as it is the most expensive to generate; then small static content (e.g. js/css), and lastly images - you probably wouldn't cache other media in memory, unless you have a really good reason to do so. In this case, with Varnish loading files from memory, and nginx loading them from disk, Varnish will likely out-perform nginx (note that nginx's caches are only for proxying and fastCGI, and those, by default are disk based - although, it is possible to use nginx with memcached).

(My quick - very rough, not to be given any credibility - test showed nginx (direct) was the fastest - let's call it 100%, varnish (with malloc) was a bit slower (about 150%), and nginx behind varnish (with pass) was the slowest (around 250%). That speaks for itself - all or nothing - adding the extra time (and processing) to communicate with the backend, simply suggests that if you are using Varnish, and have the RAM to spare, you might as well just cache everything you can and serve it from Varnish instead of passing back to nginx.)


I think you might be missing something.

By definition, dynamic files change. Typically, they change by doing some sort of database query that affects the content of the page being served up to the user. Therefore, you do not want to cache dynamic content. If you do, it simply becomes static content and most likely static content with incorrect content.

As a simple example, let's say you have a page with the logged in user's username at the top of the page. Each time that page is loaded, a database query is run to determine what username belongs to the logged in user requesting the page which ensures that the proper name is displayed. If you were to cache this page, then the the database query would not happen and all users would see the same username at the top of the page and it likely will not be their username. You need that query to happen on every page load to ensure that the proper username is displayed to each user. It is therefore not cacheable.

Extend that logic to something a little more problematic like user permissions and you can see why dynamic content should not be cached. If the database is not hit for dynamic content, the CMS has no way to determine whether the user requesting the page has permissions to see that page.

Static content is, by definition, the same for all users. Therefore no database query needs to take place to customize that page for each user so it makes sense to cache that to eliminate needless database queries. Images are a really great example of static content - you want all users to see the same header image, the same login buttons, etc, so they are excellent candidates for caching.

In your code snippet above you're seeing a very typical Varnish VCL snippet which forces images, css and javascript to be cached. By default, Varnish will not cache any request with a cookie in it. The logic being that if there is a cookie in the request, then there must be some reason the server needs that cookie so it is required on the back end and must be passed through the cache. In reality, many CMSes (Drupal, Wordpress, etc) attach cookies to almost everything whether or not it is needed so it is common to write VCL to strip the cookies out of content that is known to be static which in turn causes Varnish to cache it.

Make sense?


For dynamic contents, some kind like stock quotes actually change often (updated each second on an SaaS server from a backend server) but might be queried even more often (by tens of thousands of subscription clients):

[stock calculation / backend server] ----- [SaaS server] ------ [subscription clients]

In this case, caching on the SaaS server the per-second update from backend servers makes it possible to satisfy the queries of the tens of thousands of subscription users.

Without a cache on the SaaS server then this model would just not work.


Caching static files with Varnish would benefit in terms of offloading Nginx. Of course, if you have lots of static files to cache, it will waste RAM. However, Varnish has a nice feature - it supports multiple storage backends for its cache.

For static files: cache to HDD For everything else: cache to RAM.

This should give you more insight on how to implement this scenario: http://www.getpagespeed.com/server-setup/varnish-static-files-cache