Setting up a reverse proxy to cache images

I wrote a quick Python server to serve resampled images. For example, a URL might look something like http://images.domain.com/resample/100x100/9f362e1994264321.jpg. Being that resampling images is expensive, a caching layer is necessary. It seems like an nginx reverse-proxy would be a good option for this, and here and here seem like good places to start.

However, there's a problem. There are millions of images, so by storing http://images.domain.com/resample/100x100/9f362e1994264321.jpg in the filesystem as /home/nginx/cache/resample/100x100/9f362e1994264321.jpg (or something of the sort), eventually cache/resample/100x100/ will have millions of files in it, which will make file lookups very inefficient.

I deal with this problem while storing the original images by distributing them among many subdirectories, eg, 9f/36/9f362e1994264321.jpg. However, I'm not sure how I might do the same with nginx. I could change the URL to do likewise, and I will if that's the only solution, but I'd rather keep the URL as pretty as possible.

Can I do this with nginx? If not with nginx, can I do it something else, like varnish?


Instead google some irrelevant links, you definitely should have read the documentation about ngx_http_proxy_module.html.

Directive proxy_cache is exactly what you need. The configuration should look something like this.

http {

    # ...

    proxy_cache_path /var/www/cache levels=1:2 keys_zone=imgcache:10m max_size=1000m inactive=720m;
    proxy_temp_path /var/www/cache/tmp;

    # ...

    server {

        # ...

        location /resample {
            proxy_pass          http://bla.bla.my.backend;
            proxy_cache         imgcache;
            #proxy_cache_key    $scheme$proxy_host$request_uri;
            #proxy_cache_valid 200 302 60m;
            #proxy_cache_valid 404 10m
        }

        # ...

    }

}

In the /var/www/cache folder will be created directory structure of two levels. And cached response for http://mysite.com/resample/dir/file.jpg will be saved as md5 of proxy_cache_key value. For example, if you uncomment #proxy_cache_key $scheme$proxy_host$request_uri; above, response will be cached to file /var/www/cache/f/08/8db24849a311cc3314955992686d308f

Because MD5 ("http://bla.bla.my.backend/resample/dir/file.jpg") = 8db24849a311cc3314955992686d308f and level=1:2 translated to dir structure, counting chars from last, ...08f --> f/08/md5value


which will make file lookups very inefficient.

This sounds like premature optimization.

You've not provided any information about what OS this is running. Since you mention Varnish, I assume this is some flavour of Unix. Assuming it's Linux (although most of this applies to other OS too)....

Have you actually measured it and compared this with the path-rewriting approach? If you are seeing a degradation then you are likely running off a very old filesystem (or one whih has been upgraded by partial patching). With ext4 or BTRFS I wouldn't expect to see a measurable difference.

But that's rather besides the point. Reverse proxies know they could be caching lots of files - and will not necessarily map URL paths directly to filesystem paths.

You will run into problems with very large numbers of files managed by the cache - but these are to do with the VFS / methodology. Decreasing the vfs_cache_pressure should help.