How does Raft handle a prolonged network partition?
Solution 1:
When C rejects an AppendEntries RPC from the leader A, it will return its now > 2
term. Raft replicas always recognize greater terms, so that in turn will cause the leader to step down and start a new election. Eventually, the cluster will converge on a new term that’s > 2
and which is >=
C’s term.
This is an oft discussed (in the Raft dev community) somewhat inconvenient scenario that can cause unnecessary churn in Raft clusters. To guard against it, the Raft dissertation — and most real-world implementations — introduce and use the so-called “pre-vote protocol.” The pre-vote protocol essentially dictates that before becoming a candidate, a follower must first determine whether it can win an election by asking its peers. In the scenario you described above, C would ask for a pre-vote from A and B, and because of the network partition it would not receive any votes. So, C would never transition to the candidate role, never increment the term, and thus never present a term > 2
after the partition heals. Thus, you’ve eliminated the churn.
You can read more about the pre-vote protocol in Diego’s dissertation.