How do I investigate an unresponsive KVM guest?

Solution 1:

Investigating there kinds of problems is really difficult because you'd need to isolate different features of the setup and test them - which is very difficult on such a commplex setup and as the repro is a two weeks long process.

The first thing is try to do is to configure the syslog to send the logs over the network to a remote syslog service (possibly the one running on the host - you'd need to enable remote forwarding access on the syslog server) to allow you to catch errors that didn't make it into the guest log due to storage free space or sync issues.

If that doesn't give any useful info, you can try hooking into the guest serial console (see here for details) and log anything that happens there to a log file on the host.