High Load Average with modest CPU Utilization and almost no IO
Solution 1:
Load average is based on the processes waiting in the run queue. That means if you have processes that use fractional time slices often you can see a high load average without a high CPU utilization.
The best example of this is mail. The amount of CPU time require to send a message is very limited, but when thousands of pieces of mail are moving around the system (especially if the mail daemon forks processes to handle each one) the run queue gets very long. It is common to see well functioning, responsive mail servers with load averages of 25, 50 to over 100.
For a web server I would use page response time as the primary metric, do not worry about load average. Under modern schedulers load average less than twice the number of cores will usually have no negative effects. You may want to experiment with number of cores per VM versus total number of VMs. Some applications will benefit from many cores on a few machines, others are better at a small number of cores and many instances.
Solution 2:
If we use following shell commands to monitor the real load average, we might have different views on this phenomenon. procs_running could be much higher than we expected.
while true; do cat /proc/loadavg ; cat /proc/stat| grep procs; done
Solution 3:
I've been dealing with a scenario very similar to yours. In my case, the load average dropped after changing the IO scheduler of the problematic VM's block device to the NOOP scheduler. This scheduler is just a FIFO queue, which works well when the hypervisor will apply its own IO scheduling algorithms anyway. No need to reorder twice.
With that said, I'm still dealing with sluggish keyboard events on the problematic VM, so I think I've only removed the high load average without resolving the actual problem. I'll definitely update this answer if I find the root cause.
List available schedulers (and [scheduler] in use):
cat /sys/block/sdX/queue/scheduler
noop anticipatory deadline [cfq]
Change it with:
echo noop > /sys/block/sdX/queue/scheduler
To make it persistent, you need to add elevator=noop
to your VM's kernel boot parameters.