Setting up a reverse proxy to cache images
I wrote a quick Python server to serve resampled images. For example, a URL might look something like http://images.domain.com/resample/100x100/9f362e1994264321.jpg
. Being that resampling images is expensive, a caching layer is necessary. It seems like an nginx reverse-proxy would be a good option for this, and here and here seem like good places to start.
However, there's a problem. There are millions of images, so by storing http://images.domain.com/resample/100x100/9f362e1994264321.jpg
in the filesystem as /home/nginx/cache/resample/100x100/9f362e1994264321.jpg
(or something of the sort), eventually cache/resample/100x100/
will have millions of files in it, which will make file lookups very inefficient.
I deal with this problem while storing the original images by distributing them among many subdirectories, eg, 9f/36/9f362e1994264321.jpg
. However, I'm not sure how I might do the same with nginx. I could change the URL to do likewise, and I will if that's the only solution, but I'd rather keep the URL as pretty as possible.
Can I do this with nginx? If not with nginx, can I do it something else, like varnish?
Instead google some irrelevant links, you definitely should have read the documentation about ngx_http_proxy_module.html.
Directive proxy_cache
is exactly what you need. The configuration should look something like this.
http {
# ...
proxy_cache_path /var/www/cache levels=1:2 keys_zone=imgcache:10m max_size=1000m inactive=720m;
proxy_temp_path /var/www/cache/tmp;
# ...
server {
# ...
location /resample {
proxy_pass http://bla.bla.my.backend;
proxy_cache imgcache;
#proxy_cache_key $scheme$proxy_host$request_uri;
#proxy_cache_valid 200 302 60m;
#proxy_cache_valid 404 10m
}
# ...
}
}
In the /var/www/cache
folder will be created directory structure of two levels. And cached response for http://mysite.com/resample/dir/file.jpg will be saved as md5 of proxy_cache_key
value. For example, if you uncomment #proxy_cache_key $scheme$proxy_host$request_uri;
above, response will be cached to file /var/www/cache/f/08/8db24849a311cc3314955992686d308f
Because
MD5 ("http://bla.bla.my.backend/resample/dir/file.jpg") = 8db24849a311cc3314955992686d308f
and level=1:2 translated to dir structure, counting chars from last, ...08f --> f/08/md5value
which will make file lookups very inefficient.
This sounds like premature optimization.
You've not provided any information about what OS this is running. Since you mention Varnish, I assume this is some flavour of Unix. Assuming it's Linux (although most of this applies to other OS too)....
Have you actually measured it and compared this with the path-rewriting approach? If you are seeing a degradation then you are likely running off a very old filesystem (or one whih has been upgraded by partial patching). With ext4 or BTRFS I wouldn't expect to see a measurable difference.
But that's rather besides the point. Reverse proxies know they could be caching lots of files - and will not necessarily map URL paths directly to filesystem paths.
You will run into problems with very large numbers of files managed by the cache - but these are to do with the VFS / methodology. Decreasing the vfs_cache_pressure should help.