MariaDB how to handle 2/3 node failure in multi master cluster.

I run 3 node multi master MariaDB galera cluster. It basically works for me how it is. When one node fails, two other work and it's fine.

But I'm wondering is there a way for it to keep working on one node? (that is really unlikely but I just wonder) I know that cluster will fail when there is no quorum to prevent split brains. I also know that with one node left cluster switches off. Which means that all connections/queries to db are lost.

I wonder if there would be a way to switch that master which is left to a single node mode, let it work and then when failure is noticed (stop apps that connect to db), bring back other cluster nodes and let them replicate the data (so that nothing is lost)

I know that there is something called virtual quorum but would that be a good choice in 3 master node situation?


Solution 1:

Yes you can. When there is one node it goes into read only to prevent split brain. You can prevent this by disabling split brain protection (pc.ignore_sb); but doing so means if you have a network blip with two nodes you're risking split brain on two different servers.

Personally, I'd never do that.... It's crazy. If you don't want a full stack galera server but want to reduce the risk of losing quorum just implement another node with garbd.

The simple solution to this is the "rule of 3". If you want data-center availability you have to have three data centers with the same amount of nodes; if one data center fails you maintain quorum still meaning the cluster stays online. Another way to think of it is that if an outage can cause more than 50% of nodes to lose quorum then your cluster is down.