Nginx `keys_zone` size, persistence and maximum number of files

I'm trying to understand nginx caching better, specifically functionality relating to the keys_zone setting.

The documentation says the following:

Note that the size defined by the keys_zone parameter does not limit the total amount of cached response data. Cached responses themselves are stored with a copy of the metadata in specific files on the filesystem.

Is the number of cached files is limited by the size set in the keys_zone setting? (a one megabyte keys_zone means no more than 8,000 files can be cached?)

I'd also like to understand how the cache is affected by restarting the nginx process. Is the keys_zone data persisted, or does it not need to be persisted (since the relevant information is also stored in the file stored in the cache)?

Does restarting the nginx process effectively clear the cache? (It doesn't seem like it does but I'd like confirmation)

Is the size specified in the keys_zone setting a size for an in-memory cache of the metadata associated with each cached file (where cache misses are read from the relevant file), or is it an authoritative list of which files are in the cache and which are not?

Any information to help understand the effect of this setting better is appreciated, thanks.


Solution 1:

Looking into nginx source, src/http/ngx_http_file_cache.c, my interpretation is that the keys_zone specifies an in-memory cache of the metadata of the cached files.

The size of keys_zone specifies the cache size, and in this context it limits the number of active cached metadata entries for cached files.

There is a cleanup function called, which removes specific entries from metadata memory cache and actual file cache. I didn't clearly see what is the algorithm for selecting entries to free during cleanup.

There is no list of cached files that are stored on the file system.

When nginx tries to check if a file is cached, it first checks the metadata cache if an metadata entry for the request exists. If it exists, it opens the file and sends cached request from the file.

When no metadata entry is found, nginx will check if a file corresponding to cache key exists on the filesystem, and then reads metadata and actual cached response from it.

Is the number of cached files is limited by the size set in the keys_zone setting? (a one megabyte keys_zone means no more than 8,000 files can be cached?)

No, it is not limited to that size. nginx will look up the cached file if the metadata cannot be found in the cache.

I'd also like to understand how the cache is affected by restarting the nginx process. Is the keys_zone data persisted, or does it not need to be persisted (since the relevant information is also stored in the file stored in the cache)?

The metadata cache is clean after nginx is restarted. It gets built from the metadata in the cached files when requests arrive to the web server.

Does restarting the nginx process effectively clear the cache? (It doesn't seem like it does but I'd like confirmation)

No, the actual cached content remains in the file system, and the cached content is used after restart (provided content is still valid according to caching criteria).

Is the size specified in the keys_zone setting a size for an in-memory cache of the metadata associated with each cached file (where cache misses are read from the relevant file), or is it an authoritative list of which files are in the cache and which are not?

Your first alternative describes the functionality exactly.

nginx cache / metadata cache work in similar ways as caches usually do. If requested data is not found, it is fetched from the next layer source.

An example of HTTP request processing flow of a non-cached resource:

  1. nginx receives GET /example request and calculates cache key.
  2. nginx looks up metadata cache entry for request with the key. It does not exist.
  3. nginx checks file cache if a file corresponding to cache key exists. There is no cached file.
  4. nginx sends the request to upstream.
  5. If response is cacheable, nginx saves the response to cache and creates metadata cache entry.

Then there are other flows for the following cases:

  • cached response in file cache + metadata entry in metadata cache
  • cached response in file cache + no metadata entry in metadata cache
  • no cached response in file cache + metadata entry in metadata cache (happens if one deletes file from the cache directory)

Hope this answers your question.