How to Restart Centos 7 KVM Host safely without corrupting guest VMs/data?

I want to have a KVM virtualized server with few VMs (with different Guest OSs) in production. Now when I want to install kernel/package updates for the host machine, how to restart the hypervisor without corrupting any VM data? (when there are some SQL/disk write is happening on the VMs)

I know this is a very simple question but I mainly want to know if KernelCare should be the best option for the host to keep updated. I am looking at no data corruption/any such serious issues but minimum interruption for the VMs are OK (in application hosting server environment).


Solution 1:

The best bet for upgrading servers is to use live migration. Have a spare host which is given a fully upgraded software stack, then live migrate all running VMs to that host. Now you can safely upgrade & reboot the original host. This is how most public clouds handle their host upgrades without introducing downtime to their customer VMs.

Of course the guest VMs need their own software upgrades & reboots at some point too.

Solution 2:

I see a few options with different costs, downtime and risk.

  1. Live migration. This gives the lowest downtime and means you only risk one VM at a time. There is some risk of a migration gone wrong causing problems though and of course you need a second host box.
  2. Non-live migration, you shut the VMs down one at a time and move them. May be worth considering if you can't make live migration work in your environment.
  3. Shut down the VMs and start them up again after rebooting the host any decent VM platform should be able to send ACPI shutdown signals to VMs and any decent modern OS should understand what an ACPI shutdown signal means. If you go this route and the guest VMs haven't been rebooted in a while I would suggest doing a test shutdown and restart of each VM one at a time before shutting them all down to reboot the host.
  4. Suspend the VMs to disk and resume them after rebooting the host. The risk here is what happens if changes made during the upgrades make it impossible to resume. You then are dealing with an unclean shutdown of all your VMs.