How to manage a global VM startup order across the whole datacenter?

Solution 1:

There doesn't seem to be a clean way to fully manage a cold start of a virtual infrastructure once HA is configured on the individual hosts. Enabling HA and DRS seems to disable the Virtual Machine Startup and Shutdown options on the host servers. However, any ordering set before the host is moved into the cluster seems to stick. If the number of hosts is small or manageable, it's possible to set startup priority in the vSphere client by connecting to the hosts individually. Put your rules there. This actually works in the situation you describe.

enter image description here

Storage comes first!

Once the shared storage is up, I work on the hosts... I've had partial outages where vCenter virtualized as well. What I do in this case is set automatic boot and ordering options for the most critical systems; typically a domain controller and DNS/DHCP. Remember, vCenter is not likely to be available in the cold-start scenario. If I can fit it in, then I will... otherwise it gets started manually.

From there, I make sure HA and DRS rules are intact. I usually have disaffinity rules set for terminal servers, application servers and domain controllers. Once vCenter comes up, most of this gets sorted out.

I had a lightning strike a few weeks ago that took part of my server room down, including the switch blade containing the storage network. VMWare HA brought everything back once the storage switch ports were relocated and reprogrammed.

So, this type of issue falls under a real emergency or a manual effort. I wouldn't expect a hands-off startup of the system environment in the scenario you describe.

Edit:

Two weeks ago, I had a brownout that tripped a UPS. Two hosts, VC and a SAN/NAS device. Everything came back on its own and I didn't have to intervene (I was actually on a plane and got the messages upon landing).

Solution 2:

You can configure a vApp to help with startup & shutdown order.

To borrow from this vApp thread:

If your cluster experiences a catastrophic failure, you have a couple of options to ensure VM restart priority. I like to create vApps for this, and drag/drop the VM's in question into this vApp. Lets say you want your database server to start before your web server, so you drag them both into your new vApp. You can right click the vApp --> Edit Settings --> Start Order tab --> then you'll see Group 1 and Group 2. On the bottom of the window please notice that "All entities in the same group are started before proceeding to the next group. Shutdown is done in the reverse order." Well, you can move your servers into the groups using the arrows beside the box (I circled them in the attached image). Finally, VMware gives you the ability to dictate whether the VM's in Group 2 (and Group 3, and Group 4, etc) should start after a set number of seconds (OR) whether you want the next Group to startup after VMware Tools (a service) has started.

vApp Start Order settings