I am seeking recommendations for shared storage options to support ESXi HA cluster (note I'm NOT asking for product/brand/model recommendation - I know this is against the rules here). I am asking for technology recommendation.

The company I work for is a small business. At the moment, we have one HP DL380 G9 with DAS, with ESXi 6.0, running our custom developed application. We are now looking at how to achieve HA/FT using the most economical option. We need HA/FT because I'm the one-man-IT-team and I am often away traveling so manual failover/restore is not an option.

I understand we need minimum of 2 ESXi host (physical server) and shared storage to achieve HA/FT. This is, I think, where it gets interesting: even the cheapest entry level storage array out there is probably an overkill for us. Our storage capacity requirement is probably around 200GB, and we don't see that doubling for at least 5 years. Yet, we need the shared storage for HA/FT.

Thus would really really appreciate any recommendation on my options. Thanks.


Solution 1:

General notes (stream of consciousness):

  • Think really hard about what you're trying to protect.
  • Nobody uses VMware Fault-Tolerance. Okay, maybe someone does, but there are too many restrictions, and the use case is particularly narrow.
  • Servers are more reliable than you expect, especially when working with quality systems like HP ProLiant. Supermicro would be another story...
  • Assess realistic failure modes. An HP ProLiant Gen9 server isn't just going to fail.
  • You may encounter individual component failures, but there are enough internal redundancies to deal with most issues gracefully.
    • Seriously, redundant power supplies, redundant fans, RAIDing of internal disks, the onboard NIC and FLR adapters rarely fail.
    • Add ILO monitoring, comprehensive hardware health checks, and the range of uptime-impacting items is reduced to DIMM failures and system board problems.

So now we come to shared storage. Shared storage becomes a point of failure, depending upon how it's architected.

  • Something like an MSA SAS-attached array is an option and can work with VMware and two hosts. You can buy them bare and add the requisite capacity.
  • A shared-nothing setup would be beneficial in some respects, but adds certain complexities.
  • There are Hyperconverged options like the VMware vSAN, the HPE StoreVirtual VSA or Starwind's Virtual SAN offering.
  • The HPE VSA may be free for up to 1TB of storage for your setup.
  • An entry-level SAN isn't that compelling considering your space requirements are incredibly low.
  • It's possible to go with single-headed storage... possibly even just a normal HP server with a storage OS of your choice (Linux exporting NFS, Windows Storage Server, etc.)
  • I've documented and outlined a ZFS solution for Linux that can provide dual-head failover and clustering for storage: See: https://github.com/ewwhite/zfs-ha
  • Another solution that can do shared-nothing with a pair of servers is Zetavault.
  • Couple that with Veeam VM-level replication or something array-based, and you've covered 99% of the potential storage issues.

But again, this is a function of your risk. People can easily go down the High Availability rabbit hole...

Dual Hypervisors hosts... okay. Then do you need dual switching fabrics? Stacked switches? Multi-chassis link-aggregation (MLAG/MC-LAG)? One SAN with dual-controllers? Two SANs? SAN replication? VM replication? VM replication to diverse storage?

Do you have power diversity? Multiple PDUs? Multiple UPS units? Is the site generator-backed?

So, what are you left with?

I think it's best to have some options. Maybe contract additional help for coverage. Document the solution well enough so that the customer has some options. Make a DR or system outage runbook/script.

Solution 2:

If your company cannot withstand downtime for the users, VMware FT is your choice then. To implement this feature, you'll definetely need some kind of shared storage. For the case, I would recommend looking at software-defined storage (SDS) solutions that are increasingly being used for building virtualized infrasructures. With this approach, you can virtualize the local physical storage resources of your ESXi hosts and turn them into a fully-fledged virtual SAN. VMware VSAN springs immediately to mind, but I would point out some very interesting alternatives that should be much cheaper to implement at ESXi environment. The first candidate is HPE VSA: good level of functionality and an annoying requiremnt of a third voting node for a quorum. Yeah, I know, you can still go 2 nodes, but if you're not ok with downtime, the quorum is a must. The second candidate, on the contrary, has minimalistic hardware footprint with just two physsical hosts along with set of the features like caching, data compression etc. It is StarWind vSAN. The both solutions have free versions, just check and see how you would benefit from them.