VMware vSphere cluster design for site redundancy
I have a question about the best design for site redudancy when using vSphere clusters. A bit of background info about our situation first though.
We are a medium-sized company with two main offices, located in different countries. Our networks are linked by a Layer2 150Mbps leased line which is currently underused. We have a variety of services running for internal use within the company, some on physycal servers and some on existing vSphere clusters. In our department we also run several services (almost all running under various forms of Linux) like NTP, Syslog, jump servers, monitoring servers and so on.
We have now the requirement that those servers need to be redundant within each location (which they are not at the moment) and also site redudant (which they are to some extent, the servers are duplicated in the 2nd location with configurations kept in sync via various methods at the application layer). There is no SAN available for us, at least not something that we can use at the moment.
Cost is also an issue. While we do have some budget available for this, we can't afford to buy SANs for both locations for example.
I looked at the VSA feature and it seems that this could be something for us but I am unsure how to solve the site-redudancy requirement.
At the moment for testing purposes I am setting up in a lab a vSphere 5 with VSA on two ESXi hosts. I am currently using the Essentials Plus kit with VSA license, which allows me to build a VSA cluster on up to 3 hosts, together with a vCenter license to manage them. The hosts each have two dual-port network cards and two 600GB drives, running in Raid1. Hardware-wise this will be enough for us to run the all the services we need as VMs and will provide redundandcy within the site.
At the moment I see only two option to have site redundancy:
- build an identical VSA cluter in the second location and keep the various services sync'ed at application layer (database sync, rsync and so on).
- simply move one of the hosts from the existing cluster to the second location, basically having the VSA cluster span the 150Mbps link between the sites.
I would very much prefer the second option but I am unsure how well it'll work, if it can work at all. Technically it should, we can span the needed VLANs across the leased line and have them available in the second location. The advantage would be that we don't need to worry at all about sync'ing databases and the like. But I have the feeling that the bandwidth will not be enough, I have no way of knowing how much traffic will the VSA cluster generate between the hosts. I realize that this will most likely depend on the individual usage of the VMs but still, I have no idea how VSA replicates data between the ESXi hosts.
Are these my only options or can my goals be achieved in some other way ? Is there perhaps a way to have some sort of "cold stand by" cluster in the second location where the VMs would be sync'ed once per night from the main location ? The idea is that in case the first site becomes unavailable, we would be able to bring all those VMs online there. We would be ok with the data being 1 day old.
Any answers are appreciated.
Best regards, Stefan
Solution 1:
I would simply recommend maintaining two separate clusters and handling replication at the virtual machine level with the vSphere Replication product. This is available to you with your vSphere Essentials Plus license and allows you to maintain an RPO between 15 minutes and 24 hours (adjustable per-VM), as well as the ability to replicate to dissimilar datastore types. vSphere Replication is a component used in VMware's larger Site Recovery Manager disaster recovery suite.
I use the replication to handle offsite protection of some critical virtual machines. The newest revision of the software allows you to keep point-in-time snapshots at the destination.