PG pool failover leads to two independent masters with network failure

First, I think that pgpool2 does have a failback command, but this wouldn't help you much in that case. The problem is that chaos will result if both machines think they are the master. What's more, here you had a simple case: network went down. What if the network is partitioned? I.e., both machines are connected, but they somehow lose connectivity to each other. In that case, both machines will become the master, and they will serve different clients, and you will have a forked database. It's a rarer case, but are you certain it's so unlikely that you are prepared to risk the resulting chaos?

An alternative would be this:

                                    +- master db
                                    |
                ------ pgpool ------+
                                    |
                                    +- hot standby

In that case, however, you have a single point of failure, pgpool, which you probably don't want. I know only two ways to address this problem. The easiest is to only promote a standby to master manually, and this is applicable with your architecture. Your applications will need to go to read-only mode until human intervention.

The second way is to have quorums. One architecture that could work is this:

                +--- pgpool standing by -+    +- master db
                |                        |    |
   failover ip -+--- active pgpool      -+----+- hot standby 1
                |                        |    |
                +--- pgpool standing by -+    +- hot standby 2
                                              |
                                              +- hot standby 3

                                              (as many standby servers as
                                              you want, so that you have
                                              read-only load balancing)

The three pgpools are running on three different machines, each with its own IP address, but they also provide an additional failover IP address, taken only by the active machine, and it is the one used by clients. If the active pgpool fails, a standby pgpool takes it over. This can be accomplished with heartbeat.

In order to promote a hot standby to master, a quorum of pgpools (i.e. at least two of the three) must decide so; and they will implement the decision only after a delay of, say, 10 seconds after they decide. In addition, the active pgpool may not continue to use the existing master db for more than 10 seconds without getting confirmation from at least another pgpool (this is to guard against the case that the two standby pgpools lose their connection to the active pgpool and to the master at the same time, promote a hot standby to master, but the active pgpool continues to use the old master).

Actually the third pgpool need not participate in the failover IP, and just be there in order to help the quorum. In addition, I don't know if pgpool has enough features to do this. Maybe you need another daemon. A more general architecture is this:

              +--- active pgpool      -+          +- master db
              |                        |          | 
 failover ip -+                       -+----------+- hot standby 1
              |                        |          | 
              +--- pgpool standing by -+      +---+- hot standby 2
                                              |   | 
                                              |   +- hot standby 3
                monitoring daemon 1 ---+      |
                                       |      |
                monitoring daemon 2 ---+------+
                                       |
                monitoring daemon 3 ---+

In this case the load balancing done by pgpool is separated from the monitoring and promotion of standby to a master. Note that you can put pgpools, database servers, and monitoring daemons on the same machine, but the two pgpools must be on two different machines and the three monitoring daemons must be on three different machines. Note that I don't know whether a ready-made monitoring daemon with all necessary features exist.

The details can be changed, but I think that if you do automatic standby promotion to master without using a quorum, you are asking for trouble.


Is this even the correct architecture for this setup? Surely this is a common scenario, I can't get my head around how this kind of this could be fixed.

Disclaimer: I've not used pgpool, though I'm aware of what it does.

In clustering software you generally don't want any automated operations which could violate concurrency rules (e.g. something should only be online in one place) to take place when the cluster has knowledge of the status of less than half of the nodes. This prevents a split-brain condition, such as you've experienced. In a two-node cluster this would mean that in the event of loss of network interconnectivity between the two nodes, no automated failover should occur. A human being should make the decision to failover, having verified that it's the right action to take, depending upon the "other" node being offline, or accepting that there may be a loss of unreplicated transactions. I don't know whether this is something you can configure in pgpool.