Keepalived unwanted transition to master
I'm starting 2 EC2 intances in Amazon with Cloudformation, the second instance starts about 30 seconds after the first one.
The configuration looks like this :
Instance 2
state BACKUP
interface eth0
virtual_router_id 51
priority 100
unicast_peer {
172.17.16.10
}
Instance 1
state MASTER
interface eth0
virtual_router_id 51
priority 100
unicast_peer {
172.17.16.11
}
Both are configured with the same health check that succeeds.
I set the same priority so that I do not have a flapping issue (if MASTER goes down, transition BACKUP as new MASTER, but if the OLD MASTER comes back up, stay as is).
The issue is happening during the initial start :
- First instance boot up and enters to MASTER STATE
- Second instance starts about 30 seconds later, initially enters BACKUP STATE, but for some reason transition to MASTER afterwards.
Both should have the same priority, so why ?
I noticed that log messages about kernel IPVS and the host fingerprint calculation are only printed after keepalived is started. That makes me think keepalived is the first software on the system that make the system exchanges packets on the network interface, and somehow trigger an unwanted failover.
Also, Instance 2 enters BACKUP STATE as soon as keepalived starts :
Nov 17 17:43:58 ip-172-17-16-11 Keepalived_vrrp[2403]: Using LinkWatch kernel netlink reflector...
Nov 17 17:43:58 ip-172-17-16-11 Keepalived_vrrp[2403]: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 17 17:43:58 ip-172-17-16-11 Keepalived_vrrp[2403]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(15,16)]
whereas instance 1 become MASTER only after the kernel IPVS messages and fingerprints:
Nov 17 17:44:28 ip-172-17-16-10 kernel: [ 157.650360] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
Nov 17 17:44:28 ip-172-17-16-10 kernel: [ 157.654035] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
Nov 17 17:44:28 ip-172-17-16-10 kernel: [ 157.658356] IPVS: Creating netns size=2048 id=0
Nov 17 17:44:28 ip-172-17-16-10 kernel: [ 157.661163] IPVS: ipvs loaded.
Nov 17 17:44:28 ip-172-17-16-10 Keepalived_healthcheckers[2391]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 17 17:44:28 ip-172-17-16-10 Keepalived_healthcheckers[2391]: Configuration is using : 5174 Bytes
Nov 17 17:44:28 ip-172-17-16-10 Keepalived_healthcheckers[2391]: Using LinkWatch kernel netlink reflector...
Nov 17 17:44:28 ip-172-17-16-10 Keepalived_vrrp[2392]: VRRP_Script(chk_haproxy) succeeded
Nov 17 17:44:29 ip-172-17-16-10 ec2:
Nov 17 17:44:29 ip-172-17-16-10 ec2: #############################################################
Nov 17 17:44:29 ip-172-17-16-10 ec2: -----BEGIN SSH HOST KEY FINGERPRINTS-----
Nov 17 17:44:29 ip-172-17-16-10 ec2: -----END SSH HOST KEY FINGERPRINTS-----
Nov 17 17:44:29 ip-172-17-16-10 ec2: #############################################################
Nov 17 17:44:29 ip-172-17-16-10 Keepalived_vrrp[2392]: VRRP_Instance(VI_1) Transition to MASTER STATE
-- To test the configuration :
- I stop both keepalived
- I start keepalived on instance 1, it goes to MASTER.
- I start keepalived on instance 2, it goes to BACKUP and do not trigger a failover.
So everything looks ok.
By design, in case of equal priority, VRRP will select the node with the highest primary address as the MASTER.
https://www.juniper.net/techpubs/en_US/junose11.3/topics/concept/vrrp-router-election-rules.html http://www.ietf.org/rfc/rfc3768.txt
If the Priority in the ADVERTISEMENT is equal to the local
Priority and the primary IP Address of the sender is greater
than the local primary IP Address, then:
o Cancel Adver_Timer
o Set Master_Down_Timer to Master_Down_Interval
o Transition to the {Backup} state