VMware ESXi shutdown triggered by APC UPS connected via USB
I am shipping a bunch of ESXi 5.1 servers to remote offices where they will be powered via APC UPS.
I would like to have the UPS trigger a shutdown of the connected server - I would then rely on the ESXi configuration to take care of the shutdown/suspension of the VMs hosted on it.
I can see that APC have a solution documented using their PowerChute Network Shutdown, but this involves setting up an extra server per office, and requires network cards on each UPS. We are generally using UPS without a network card (e.g. Back-UPS Pro) - they come with a USB connector, and they are readily available in the locations where our offices are.
How can I connect a UPS to a ESXi host via USB, then have ESXi detect a power outage and then act accordingly? Has anyone managed to do this.
Solution 1:
According to APC, this is not possible and you require Powerchute Network shutdown. We tried this a number of times with USB and found no solution.
VMWare has info here on using the APC approved solution.
Would also think SmartUPS would be a better choice and you can fit with network card. Naturally more money but if your servers are at all important, that cost should be worth it. Also gives you more monitoring and alerting which might be useful at a remote site. You also need to assure sufficient runtime for all VMs to cleanly shutdown and then shutdown the host
Solution 2:
Yes, it's possible. Here are details of my similar setup.
Hardware configuration: APC Smart-UPS 1500 connected to the ESXi 5.1 Host via USB. A Linux virtual machine running on this ESXi host. UPS is connected to this VM using ESXi USB pass through option.
Software configuration: NUT (Network UPS Tools) master running in the VM, and native ESXi NUT slave running on the ESXi host.
Shutdown logic: VM is running the UPS driver usbhid-ups which is responsible for the communication with UPS via USB. The upsd process connects to the UPS through the usbhid-ups driver and monitors the UPS state. The upsmon master process running on the same machine connects to the upsd and initiates the shutdown. ESXi host is running the 2nd instance of upsmon which also connects to the same VM upsd via internal network.
On power failure the following sequence takes place:
- UPS via usbhid-ups reports to upsd about power failure.
- (optional, useful if you want to shutdown in few minutes instead of Low Battery) upsmon on the VM initiates upssched 5 minutes timer. Timer is aborted if power is restored.
- When timer fires or when UPS reports Low Battery, the upsmon raises the FSD (forced shutdown) flag to upsd.
- In a stand-alone NUT configuration the FSD flag would shutdown the machine. But here the shutdown command is replaced by simple logging like "I should shutdown now but I am waiting for the host instead". And does nothing.
- The FSD flag is also read by ESXi upsmon, which initiates the ESXi host shutdown.
- ESXi host shuts down all virtual machines one by one. The important thing is that VM which runs the upsd should be shutdown last (using ESXi startup/shutdown sequence configuration).
- Important: this VM must have vmware tools installed. When it receives the guest shutdown command from the host, the vmware-tools shutdown script is being started. This script checks for the /etc/killpower flag. If no flag, it does nothing (this means user activated linux shutdown, not the UPS event). But if the flag exists (FSD active), then this script sends to UPS the delayed powerdown command (say, in 3 minutes).
- After running vmware-tools script the guest VM shuts down.
- ESXi sees the last VM poweroff state and goes down itself (it takes around 1 minute because there is no other machines running now).
- In 2 remaining minutes the UPS cuts off the power.
- When power is restored, the ESXi starts and powers on all VMs. The UPS monitoring machine must be started first (the same configuration as for shutdown order).
Downloads:
The NUT for Linux could be installed from package.
The native NUT client for ESXi server can be downloaded using last link on this page: http://www.networkupstools.org/download.html
Some my scripts and conf files are here (only changed lines are shown): http://pastebin.com/KkEeanK1
Notes:
Of course there are more details, and it took some time for me to make this working as it should. But now it performs very nicely. This system accounts for the cases when you just shutdown the monitoring VM from inside (vmware-tools script is not run), or if it's a ESXi host initiated VM shutdown (no /etc/killpower flag, so no UPS load off), or if it's a ESXi shutdown (the same). The only important is to have this VM running ASAP after host boot, and shutdown it last (so host down time is predictable - as said above, it is around 1 minute for me and 2 more minutes I reserve just in case).
My UPS monitoring Linux VM is also Samba/NFS sharing server for backup storage, the NAT/DHCP server for VMs, and some other light-weight services. It takes around 22MHz of ESXi CPU shares and around 10MB of active RAM when idle. Due to using the NUT you can power more devices from the same UPS if required, and all they can be shut down gracefully. No PowerChute and/or expensive Network Monitor Card is required.