How long does it take to failover with oVirt/RHEV?
Solution 1:
The failover works using the classic clustering mechanism - a failure is detected (hypervisor unreachable), the hypervisor is fenced (multiple mechanisms and tiers supported), and the VMs that were marked HA get started on other hosts. The process should take about 2 minutes or less, depending on your settings and hardware.
This works quite well in oVirt for disaster scenarios, but these VMs come back up as if from a power outage, all in-flight data will be lost of course. If you care about state, you need to implement active-active software on top of your hypervisors, the usual VM failover will not be enough. Still, for MOST scenarios, this is plenty, and the advantage of being able to turn any software stack into an HA stack by simply marking a VM it is deployed on as HA is pretty significant.
In short, basic VM HA is a nice feature, but if you really need to not have any downtime and never lose the memory states, you will need to use software that implements active/active clustering, sharding, distributed or try and go completely stateless, so a lost node will not matter. If you specify the actual software you'll be running, maybe we here could advise on what to do with it.