PHP - best way to cache and serve image and video files

I would definitely not advise you to store these types of workloads in Memcached or Redis and I would also not advise you to have these workloads processed by PHP.

Varnish is indeed the way to go here.

Why not Memcached & Redis?

Memcached and Redis are distributed key value stores. They are extremely fast and scalable and are perfect to store small values that change on a regular basis.

Image and video files are quite large and wouldn't really fit well in these memory-only databases. Keep in mind that Redis and Memcached aren't directly accessible from the web, they are caches that you would call from a web application.

That means there is additional latency running them through an application runtime like PHP.

Why not PHP?

Don't get me wrong, I'm a huge PHP fan and have been part of the PHP community since 2007. PHP is great for building web pages, but not so great for processing binary data.

These types of workloads that you're looking to process can easily overwhelm a PHP-FPM or PHP-CLI runtime.

It is possible to use PHP, but you'll need so many servers to handle video and image processing at large scale, that it will become an operational burden.

Why Varnish?

Varnish is a reverse caching proxy that sits in front of your web application, unlike distributed caches like Memcached and Redis that sit behind your web application.

This mean you can just store images and videos on the disk of your webserver and Varnish will cache requested content in memory without having to access the webserver upon every request.

Varnish is built for large-scale HTTP processing and is extremely good at handling HTTP responses of any size at large scale.

Varnish is software that is used by CDNs and OTT video streaming platforms to deliver imagery and online video.

Using video protocols like HLS, MPEG-DASH or CMAF, these streaming videos are chunked up in segments and indexed in manifest files.

A single Varnish server can serve these with sub-millisecond latency with a bandwidth up to 500 Gbps and a concurrency of about 100,000 requests.

The amount of machines you need will be way less than if you'd do this in PHP.

The Varnish Configuration Language, which is the domain-specific programming language that comes with Varnish, can also be used to perform certain customization tasks within the request/response flow.

The VCL code is only required to extend standard behavior, whereas in regular development languages like PHP you have to define all the behavior in code.

Here's a couple of Varnish-related resources:

  • The Varnish Developer Portal: https://www.varnish-software.com/developers/
  • The Varnish documentation: http://varnish-cache.org/docs/
  • The Varnish 6 By Example book that I wrote: https://info.varnish-software.com/resources/varnish-6-by-example-book

Maybe even Varnish Enterprise?

The only challenge is caching massive amounts of image/video content. Because Varnish stores everything in memory, you'll need enough memory to store all the content.

Although you can scale Varnish horizontally and use consistent hashing algorithms to balance the content across multiple Varnish servers, you'll probably still need quite a number of servers. This depends on the amount of content that needs to be stored in cache at all times.

If your origin web platform is powerful enough to handle requests for uncached long-tail content, Varnish could store the hot content in memory and trigger caches misses for that long-tail content. That way you might not need a lot of caching servers. This mainly depends on the traffic patterns of your platform.

The open source version of Varnish does have a file storage engine, but it behaves really poorly and is prone to disk fragmentation at large scale. This will slow you down quite significantly as write operations increase.

To tackle this issue Varnish Software, the commercial entity behind the open source project, came up with the Massive Storage Engine (MSE). MSE tackles the typical issues that come with file caching in a very powerful way.

The technology is used by some of the biggest video streaming platforms in the world.

See https://docs.varnish-software.com/varnish-cache-plus/features/mse/ for more information about MSE.

Varnish Enterprise and MSE are not free and open source. It's up to you to figure out what would be the cheaper solution from a total cost of ownership point of view: managing a lot of memory-based open source Varnish servers or paying the license fees of a limited amount of Varnish Enterprise servers with MSE.