What happens when a physical machine fails in a virtual environment? [closed]

The specifics depend on which exact virtualization solution you use, but the idea is that you have a virtual farm, where there are a number of physical hosts with several virtual machines each. You then use some of the efficiency you gained by not needing a physical host for every VM so that you have enough overhead left to cover in the case where a physical machine goes down.

Additionally, you can locate the VHDs for each VM on a common (redundant) SAN. The hypervisors on each physical host can be set to talk with each other and share memory from different VMs. There is some latency, and much of the memory will be backed by disk, but if one of the physical hosts go down you're not even waiting for the VMs from that host to boot back up. Instead, those VMs will be automatically distributed among the remaining hosts. The ultimate goal is that these machines will pick up from almost where they left off, with little to no downtime at all. In a sense, all of your VMs are already running on at least two physical hosts. In practice, right now hypervisors can only do this kind of migration one machine at a time, when they know it's coming before the host fails... but make no mistake: instant migration on hardware failure is the ultimate goal for all of the major hypervisors.

This is why you sometimes see a server virtualized to a single physical host in a farm. You may not gain any hardware efficiency (you may even lose some performance), but you make up for it in terms of management consistency and built-in high-availability.


All virtual servers running on a physical host will go offline if the host has any sort of failure.

That said, most platforms offer a high-availability solution for a single VM. Other times a system is built with multiple nodes to prevent service disruption in the event that one node goes down.

If two VM nodes make up a highly available service, it is possible to configure the hyper visor to ensure that the two nodes are not reliant on the same physical infrastructure (fault tolerance). This could be more than just physical server fault tolerance, including different network paths, all the way up to geographically disparate location.