How to prevent IO Load under Linux for Video serving?
I'm currently serving videos of conferences via Nginx on 3 servers; 4 cores, enough memory (no swap used) and RAID-10 with 8 drives per server. Unfortunately iostat -xd 1 gives me 100% on all 3 servers and iotop shows Nginx eating that 99-100%.
I've been thinking about distributed FSs (but which one and would it help?); any other ways to prevent this without just buying new servers (with all the overhead involved...)?
Note that it is not possible to fit the videos in memory; there are too many and they are too big. The distribution is also not ok to put only a few in memory.
Solution 1:
My job is building large (>1m user) commercial VoD systems and unless you can utilise multicast/anycast and don't use a CDN then you just have one option and that's to scale up your storage systems and networking to handle the maximum concurrent IO load you need.
Certainly local caching, as you alude to, can help but I always size our streamers to assume zero caching. Obviously our use cases are going to be different but if you have a comparatively small video catalogue you could consider putting your content either on SSD-based volumes and/or PCIe-based flash storage such as FusionIO kit (there are other manufacturers of this kind of thing these days but as FIO were the first on the block they're the ones I know and trust best). When we built out my first significant platform of this type we ended up using literally tens of thousands of 72GB 2.5" SAS disks just to ensure we had enough random-read capability, because that's what you need, logic says that you'd care more about sequential performance because the video content is just big files but when you have so many people playing different videos from different start points this type of caching pretty much goes out of the window, you need the best random-read performance you can get. It's also important to ensure you can have as efficient a path from disk to network port too, there's no easy way to optimise this other than to understand your disks/controllers/buses/NICs and drivers.
Filesystem changes are unlikely to get you past this problem by the way.