Architecture for highly available MySQL with automatic failover in physically diverse locations
I have been researching high availability (HA) solutions for MySQL between data centers.
For servers located in the same physical environment, I have preferred dual master with heartbeat (floating VIP) using an active passive approach. The heartbeat is over both a serial connection as well as an ethernet connection.
Ultimately, my goal is to maintain this same level of availability but between data centers. I want to dynamically failover between both data centers without manual intervention and still maintain data integrity.
There would be BGP on top. Web clusters in both locations, which would have the potential to route to the databases between both sides. If the Internet connection went down on site 1, clients would route through site 2, to the Web cluster, and then to the database in site 1 if the link between both sites is still up.
With this scenario, due to the lack of physical link (serial) there is a more likely chance of split brain. If the WAN went down between both sites, the VIP would end up on both sites, where a variety of unpleasant scenarios could introduce desync.
Another potential issue I see is difficulty scaling this infrastructure to a third data center in the future.
The network layer is not a focus. The architecture is flexible at this stage. Again, my focus is a solution for maintaining data integrity as well as automatic failover with the MySQL databases. I would likely design the rest around this.
Can you recommend a proven solution for MySQL HA between two physically diverse sites?
Thank you for taking the time to read this. I look forward to reading your recommendations.
You will face the "CAP" theorem problem. You cannot have consistency, availability and partition-tolerance at the same time.
DRBD / MySQL HA relies on synchronous replication at the block device level. This is fine while both nodes are available, or if one suffers a temporary fault, is rebooted etc, then comes back. The problems start when you get a network partition.
Network partitions are extremely likely when you're running at two datacentres. Essentially, neither party can distinguish a partition from the other node failing. The secondary node doesn't know whether it should take over (the primary has failed) or not (the link is gone).
While your machines are in the same location, you can add a secondary channel of communication (typically a serial cable, or crossover ethernet) to get around this problem - so the secondary knows when the primary is GENUINELY down, and it's not a network partition.
The next problem is performance. While DRBD can give decent** performance when your machines have a low-latency connection (e.g. gigabit ethernet - but some people use dedicated high speed networks), the more latency the network has, the longer it takes to commit a transaction***. This is because it needs to wait for the secondary server (when it's online) to acknowledge all the writes before saying "OK" to the app to ensure durability of writes.
If you do this in different datacentres, you typically have several more milliseconds latency, even if they are close by.
** Still much slower than a decent local IO controller
*** You cannot use MyISAM for a high availability DRBD system because it doesn't recover properly/ automatically from an unclean shutdown, which is required during a failover.
What about using a VLAN to tie all the servers at the two (or more) data centers together. You could then use CARP for automatic failover. Use database replication to keep everything in sync.
If you own the data centers you can ensure each data center has multiple WAN uplinks.
Your first stage should be to upgrade your current HA solution to one that uses OpenAIS as the Cluster membership layer: this will give you a lot of flexibility, and given low latency links between sites, might be able to reach across. PaceMaker and RHEL Clustering support this.
For automatic data center failover, you really need a third site to act as a tie-breaker, otherwise your sites will not be able to distinguish between inter-site routing problems and remote site failure. Microsoft has some surprisingly good web-casts covering the area:
Windows Server 2008 multi-site clustering
Obviously the exact technology doesn't map onto the Linux domain, but the concepts are the same.