When to move a virtualized server to physical?
Virtualization has some great benefits, but there are times when a virtualized server needs more performance and should be moved to physical.
My question is, how do you tell when these times are? I'm looking for measurable data and metrics that show moving a server to its own physical box would make a significant difference to performance. Personally I'm interested in Windows but presumably the essentials are the same across all platforms.
The one case where I had to carry out a V2P was for an MS SQL box that had been running on dual 3.2Ghz dual core CPU's (total CPU 14.4Ghz) that we migrated to an ESX 2.5 cluster where the underlying hardware was newer with more slower (2.4Ghz IIRC) cores. Adding in the ~10% overhead even with 4 vCPU's this VM could only ever get an effective 8-8.5Ghz aggregate CPU. 60% peak CPU before migration became 90-100% post migration, customer wanted headroom so we reverted to physical. To answer your question specifically we saw that the box was running at 100% CPU across the board in Perfmon and in the VI client. A better solution (in my view) would have been to upgrade to faster CPU's but there are edge cases like this where thats not economical especially with the trend to slower cpu's with more cores that we saw with the introduction of the Opterons\Core CPU's.
With ESX 4 we could bump a box like this up to 8 vCPU's but that wasn't an option at the time.
As far as looking for performance ceilings that might indicate you need to abandon your VM then with a Windows Guest on VMWare environment then the combination of Perfmon and the VI Client should be more than up to the task of finding any VM's that are performance limited themselves. Add in getting some SAN analytics to that if you can but if the SAN shows an issue then you will almost certainly be just as well off reworking the storage in order to isolate and\or enhance the volumes that the VM's virtual disks are stored on. Same applies to any other OS\Hypervisor combination - get whatever internal stats you can but correlate them to the Hypervisor's view of what's going on because 100% CPU being reported within a VM (for example) does not necessarily mean that the Hypervisor could never deliver more performance, just that it didn't at that point.
I disagree that a virtual server would need to be moved to physical because of performance. Hypervisors are now so close to the metal that there is virtually (pun intended) no performance hit. Especially now that many board makers are including hypervisors on the chipset. If you took two servers with identical hardware, one running a single guest and one running an exact copy of that guest on the physical hardware, you would be hard pressed to notice a difference in performance I think.
There are other reasons, though, you may need a physical server rather than virtual. One of them is hardware compatibility. If your application requires non-standard hardware with its own unique bus, you may not be able to run that in a virtual machine.
I'm anxious to hear what others have to say. Great question.
NOTE: We have servers that were virtualized and then put back on the same hardware just to have the snapshot/vmotion capabilities we love.