Use cached file mtime as Last-Modified header value

On nginx 1.10.1 I'm proxying an external website (not under my control) to cache images locally.

my config is the following:

location ~ /cachedimages/(?<productcode>.*)/(?<size>.*)/image.jpg {
   resolver 127.0.0.1;
   proxy_pass             https://www.externalsite.example/api/getImage/?productcode=$productcode&size=$size;
   proxy_cache            imgcache;
   proxy_cache_valid      200  1d;
   proxy_cache_use_stale  error timeout invalid_header updating http_500 http_502 http_503 http_504;

   expires 1M;
   access_log off;
   add_header 'Cache-Control' "public";
   add_header Last-Modified $upstream_http_last_modified;
   add_header X-Proxy-Cache $upstream_cache_status;
  }

imgcache is defined has following:

proxy_cache_path /var/cache/nginx/imgcache levels=1:2 keys_zone=imgcache:10m max_size=1g inactive=24h;

The remote server doesn't give the Last-Modified header:

curl -X GET -I https://www.externalsite.example/api/getImage/?productcode=abc123&size=128
HTTP/1.1 200 OK
Date: Thu, 15 Sep 2016 08:16:07 GMT
Server: Apache
Transfer-Encoding: chunked
Content-Type: image/jpeg

and my server adds some header but not Last-Modified

curl -X GET -I https://www.myserver.com/cachedimages/abc123/128/image.jpg
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 15 Sep 2016 08:33:26 GMT
Content-Type: image/jpeg
Transfer-Encoding: chunked
Connection: keep-alive
Expires: Sat, 15 Oct 2016 08:33:26 GMT
Cache-Control: max-age=2592000
Cache-Control: public
X-Proxy-Cache: HIT

How can I force nginx to read the mtime of the cached (and hit) file and serve it as Last-Modified header value?


Solution 1:

The $upstream_http_* embedded variable stores the headers sent by the upstream server in the cache. You can abuse the Date: header sent by upstream to fill the Last-Modified header sent by your reverse proxy, like this:

 add_header Last-Modified '$upstream_http_date';

Works as expected:

  Last-Modified: Sun, 22 Apr 2018 08:48:44 GMT
  X-Cached: MISS
  ...
  Last-Modified: Sun, 22 Apr 2018 08:50:05 GMT
  X-Cached: HIT
  ...
  Last-Modified: Sun, 22 Apr 2018 08:50:05 GMT
  X-Cached: HIT

More info on $upstream_http_* here: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#variables (look for $upstream_http_name).

That being said, what you are trying to achieve imho is generally a bad idea: the reverse proxy has no idea if the object upstream has been updated since it was last fetched or not, yet it will tell the client downstream that the object has not been modified. This is false information.

Of course there might be reasons why you want to do it, i.e. if you have full control on any object update happening on upstream and/or if you plan to flush the cache of the reverse proxy manually everytime is needed.

If you have a single reverse proxy, I strongly recommend that you look into ETags as a better solution to your problem. If you have a pool of reverse proxies, using ETags effectively gets complicated.