Identifying what caused a server reboot
I have an HP ProLiant DL380p Gen8 that is running VMWare ESXi 5.5. It has been rebooting itself at seemingly random intervals for the past 24 hours. There is only a single VM running, and even if I shut it down the host will still reboot. The server is not running out of memory or disk space, and as far as I can tell is not overheating. I've tried looking through log files, but there is just so much to look at.
What are the most important steps in diagnosing this problem (including which settings to check, what files to look at, what specific message would indicate trouble, should I start pulling memory, is there a diagnostic CD that does all this for me, etc)?
I know this is a very broad question. I'm happy to provide log files if necessary to make this more specific to my situation.
Here are a few suggestions.
Is your ILO connected and configured? It will tell your exactly what's happening with the system. Please review the ILO4 log.
View the system's IML log (available via ILO or vSphere "hardware" tab)
Are there any indicators or error messages on the screen during crash or at POST?
Are you using the HP-specific install of ESXi (includes additional drivers and tools)
What version and build number of ESXi are you running?
If the virtual machine you're running is a Windows 2012 or 2008 guest, you may be running into a NIC driver bug.
Check your power connections. Do you have dual power supplies? Re-seat the power cables one at a time.
- Look at the System Insight LED array on the front of the server to determine if there's an internal health problem.