keepalived VRRP_script not failing over
Solution 1:
I had exactly the same issue however my problem was not in the firewall nor in my Ethernet adapter but in the "weight" settings of the check script.
This was my configuration:
MASTER:
vrrp_instance haproxy {
state MASTER
interface eth0
virtual_router_id 51
priority 150
advert_int 1
BACKUP:
vrrp_instance haproxy {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
Check_script:
vrrp_script chk_haproxy {
script "python /root/ha_check.py"
interval 2 # check every 2 seconds
weight 2
rise 2
fall 2
}
The reason the master was refusing to release the VIP was because despite the fact the script had failed, the master was still having higher priority number from the BACKUP server. This happened because the "weight" setting on check_script was not enough to cover the "GAP" between the priority number, meaning raising the priority number of the BACKUP server greater to the one of MASTER Server. I will further explain:
According to the manual of keepalived, a positive number on the "weight" setting will add that number to the priority if the check succeeds.
A negative number will subtract that number from priority number if the check fails.
So, according to my configuration:
Server Priorities Prior failure of the script:
MASTER: 152
BACKUP: 100
Failover_IP: MASTER
The failover ip is correctly "grabbed" by master server since Master has higher priority compared to Backup server (152 > 100)
Server Priorities AFTER failure of the script:
MASTER server: 148
BACKUP server: 102
Failover_IP: STILL ON MASTER
The failover ip is still on master server because Master has again higher priority compared to BACKUP (148 > 102). The MASTER server was refusing to release the IP and right he did since his priority was higher than the other server.
The solution on my situation was:
Solution -1 : Change the priority number of both servers so they dont have much "GAP".
For example:
Master Priority: 150
Backup Priority: 149
Check_script weight: As it is ( 2 ).
With the above configuration, when the script succeeds (meaning all is ok) the priorities would be:
Master: 152
Backup: 149
IP_Location: On Master (152 > 149)
When script fails:
Master: 150
Backup: 151
IP_Location: On Backup (151 > 150)
Solution - 2: Change the weight number of the script from 2, to -60
Solution 2:
I've had the same issue - two CentOS 7.1 servers with track_script, and failing the vrrp_script on the MASTER would only result in the lone log message "VRRP_Script(chk_script) failed", not in a failover. On the BACKUP server, however, I got a lot of messages of keepalived trying to take over the virtual IP for as long as I had the track_script on the MASTER server fail.
Solution in my case: The firewall (iptables) on the MASTER server wasn't configured correctly to allow VRRP packets / multicast packets, while at the same time the firewall on the other server, the BACKUP, was configured correctly.
I had entered the same iptables rules into both servers as follows:
iptables -A INPUT -i eth0 -d 224.0.0.0/8 -j ACCEPT
iptables -A INPUT -p vrrp -i eth0 -j ACCEPT
This worked on one of the servers (the BACKUP VRRP server) but not the MASTER one because I'd forgotten that the interface wasn't named 'eth0' on the MASTER server, thus the two rules had no effect at all.
This explained the behavior I'd observed:
If keepalived cannot see any other VRRP speaker for a certain virtual_router_id, it still believes itself to be the one with the highest priority (thus rightful MASTER) even after a negative weight modification as it never receives VRRP messages with a priority higher than its own (because advertisements of other speakers are blocked by the firewall and can never reach the keepalived process to make it aware of them). That's why you don't see it release the VIP.
The BACKUP server, however, was able to see the adverts of the (now failed) MASTER, found the priority in those packets reduced to a value less than its own, and proceeded to declare itself MASTER and send gratuitous ARPs to claim the VIP. So we ended up in a situation where both servers thought they'd need to serve the VIP as MASTER.
Conclusions: - Always check the firewall configuration on all VRRP speakers if you experience strange behavior (no failover, several MASTERs). Keepalived logging isn't quite as helpful as it could be (a simple message "VIP not released because I'm still highest prio" after the "VRRP_Script(chk_script) failed" line would've eased troubleshooting immensely.
- A track_script is not an on/off type of switch ("if script OK: eligible for VIP election; if NOT OK: completely ineligible for VIP election") - it merely increases / decreases the likelihood of winning the election, and if keepalived only ever observes itself as the only VRRP speaker and never receives any messages of other speakers, there's not much of an election really - you always win.