High availability virtual machines
I've been reading a lot about high availability virtualization, either via Hyper-V or VMWare. In that context, essentially high availability means that the VM is hosted by a cluster of physical servers (nodes), so if one of the physical servers goes down, the VM can still be served by other physical servers. So far so good, the physical cluster and the VM itself are highly available.
However if the service being provided, let's say SQL server, MSDTC, or any other service, are actually being provided by the VM image and the virtualized operating system. So I imagine that there is still a point of failure at the virtual layer that isn't accounted for. Something could happen within the virtual machine itself that the physical cluster can not account for, correct? In that instance the physical failover cluster (Hyper-V) or VMWare host, can not fail over, because the issue is not with one of the servers in the physical cluster - failing over a physical node would not do any good.
Does this necessitate building a virtual failover cluster on top of the physical one, or is this not necessary?
Alternatively, I suppose you could skip the phsyical clustering, and just cluster at the virtual layer (Child based failover clustering), because that should still survive a physical failure.
See image below showing parent based (left), child based (right) and a combination (center). Is parent based as far as you need to go, or is child based more appropriate?
The answer is it depends.
Clustering solutions usually do more than the application layer. Traditionally a cluster dependency graph will include things like,
- Network / IP availability check
- Storage / share volume availability.
Running some of these checks inside a VM is hoorrendously difficult. For e.g. In Windows 2003 Clusters, it requires a Quorum drive that it uses a SCSI lock on to ensure that it is the owner of the resources. On failures it also sends out 'poison packets' to acquire that lock. All of these feature are near impossible to implement without a RDM to a LUN.
All of these 'hardware detection' components will have a large overhead within a VM.(VM performance is always great for user apps, but anything kernel base will always incur varying degrees of overhead).
So in the case of Microsoft Windows 2003 clusters (and I had to virtualise I'd use your 'child' approach).
The ideal place to strive for is,
- VMware HA for hardware failure detection.
- vSphere application monitoring
Followed by,
- VMware HA
- An application only monitor (without the hardware dependency)
- Make sure anti affinity is on for the paired VMs so DRS, HA never restart the nodes on the same hosts!
Finally
- Child clustering
The physical cluster makes your virtual hardware highly available, i.e. failures of a physical server don't affect any given virtual machine. However, the virtual machine itself can still fail (e.g. OS crashing, someone shutting down the virtual server, etc.), so the service running on top of the virtual machine may still fail at some point (although it's less likely than it'd be for the same service running on standalone physical hardware). To mitigate this risk, you create the clustered service, so that the service remains unaffected even if a virtual server fails. Of course you could achive more or less the same results, if you built the clustered service directly on physical servers.
Whether you run your clustered service on physical servers or on top of a clustered virtualization platform depends on your requirements. If you don't need a virtualization platform for anything else or the clustered service needs lots of system resources then I'd recommend building the cluster on physical hardware. But if your physical hardware has resources to spare or you already have a virtualization cluster, I'd run the clustered service on virtual machine, because that makes managing the (virtual) hardware a lot easier.
Don't forget to take a reality pill along the way, though.
You need to understand the required uptime for your application, and more importantly, the maximum amount of time your application can be unavailable when it does fail. And it will.
This second point is critical; I've seen a "five nines" application being managed by a large systems integrator that was offline for nearly a day because the complexity of the technology being used to keep it highly available. For day-to-day operational availability, the technology ticked the boxes, but when something went wrong with the config, the folks at the aforementioned company were properly stuck.
Don't get me wrong, clustering, SAN snapshots, VM snapshots, off-site replication, HA lock-step virtualisation, Etc., have their place, but just make sure you choose what's required, not what looks nice and shiny.
I'll step down from my soap box now ;-)