Etag configuration with multiple apache servers or CDN / How does Google do ETags?

You can configure Apache so it doesn't use the inode as part of the hash. See the FileETag directive.


Current practice is to remove ETags, for precisely the reasons given in OPs post. Instead you can rely on the other caching headers, i.e. Cache-Control and Expires, and cache resources unconditionally (assume static content on a given URL to be unchangeable, so when the content has to change, you give it a new URL too). Steve Souders built the case for this while at Yahoo!, and published a good book about this and other performance improvements.

You can use ETags if you want to; you'll just have to take good care that all servers are configured exactly alike, and that ETags are generated from something that's machine-independant. One way of doing that is to generate ETags from a hash of the file contents, or a hash of (filename + size), as James wrote.

My guess is -- without any evidence -- that Google isn't using a 3rd party CDN, they are just using their own servers in their many datacenters worldwide. They then keep the configuration of their webservers consistent across the globe, and just use something like (last modified time + filesize) as the basis of their ETag.

For the rest of us, not using ETags is IMHO simpler and better.