Virtual machines and I/O heavy workload, is it ever sane?

Is it ever sane to use a Virtualized solution when performing I/O heavy workloads?

Yep, very sane indeed, in fact for most organisations now virtual is the default and doing things on physical boxes is the very much the exception. We have over 100k VMs of all forms and many of them are >40k IOPS with no issue at all.

What are the best practices around this sort of stuff?

The key thing here isn't whether it's virtualised or not - it's understanding your IO needs well and matching the virtual storage resources. It's that simple, if you know what you need/want and have the budget to match that with your storage systems then the virtualisation layer really plays little or no part - unless you're REALLY pushing things of course (I'm talking tens/hundred of millions of IOPs).

What causes these problems, are there well known system bottlenecks, or is just a question of excessive contention?

Lack of understanding or trying to do too much with too little storage resources, that's what normally causes people problems.


Is it ever sane to use a Virtualized solution when performing I/O heavy workloads?

Does a database server regularly pulling 1gb/second random IO count? Have one here.

Or a virtual file server delivering up to 600mb/second to a HPC cluster. That one is running off 8 Velicoraptors in a Raid 10, dedicated.

What are the best practices around this sort of stuff?

Provide plenty of IO. I think this SQL VM has around 8 or 10 dedicated SSD.

What causes these problems, are there well known system bottlenecks,

People not doing basic math. If the IO subsystem is not capable of handling the load, it also will not do so under virtualization. Need a LOT of IO - then provide a dedicated storage subsystem of appropriate size.


Besides the basic math & concept that you still need the same IOs as non-virtualised, there is also QOS/prioritisation. Most virtualisation platforms offer at least a basic support for this, will help out a lot to prevent the misbehaving dev VM stalling your prod DB.