keepalived issues on xen domU

I cannot manage to run keepalived correctly on xen domU.

I am following this link for configuration, and it works great on some local VM (running with KVM). If I set up the exact same configuration, but on xen domU, it does not work: both servers do not see each other and decide to be master (10.10.0.200 being the virtual IP)

$ sudo ip addr sh eth0 # host1
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:16:3e:73:b0:78 brd ff:ff:ff:ff:ff:ff
inet 10.10.0.100/24 brd 10.10.0.255 scope global eth0
inet 10.10.0.200/32 scope global eth0
inet6 fe80::216:3eff:fe73:b078/64 scope link 
   valid_lft forever preferred_lft forever

$ sudo ip addr sh eth0 # host2
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:16:3e:ee:5e:fd brd ff:ff:ff:ff:ff:ff
inet 10.10.0.101/24 brd 10.10.0.255 scope global eth0
inet 10.10.0.200/32 scope global eth0
inet6 fe80::216:3eff:feee:5efd/64 scope link 
   valid_lft forever preferred_lft forever

Is there a way I could debug this - it seems some people are able to use keepalived on xen following some mailing list, but without much info on their config.

the domain 0 has two "real" ethernet cards, eth0 and eth1, with eth0 connected to the network:

  • eth0 is listening to 192.168.3.9
  • eth1 is listening to 10.10.0.1

My xend config is :

(xend-relocation-server no)
(network-script 'network-nat netdev=eth1')
(vif-script     vif-nat)
(dom0-min-mem 1024)
(enable-dom0-ballooning no)
(total_available_memory 0) 
(dom0-cpus 0)
(vncpasswd '')

And the relevant section in /etc/hosts in xend is:

10.10.0.100    test1 test1
10.10.0.101    test2 test2

Each domU (test1 and test2) are configured to 10.10.0.100 and 10.10.0.101 respectively. Each can ping to each other through those names (configured manually through /etc/hosts for now). The virtual IP is 10.10.0.200

Note that for now, I don't care so much about the networking configuration in the dom0 (bridge vs ...), I would like to make keepalived work between the domU at all as a first step

The current ip tables on the dom0:

# Generated by iptables-save v1.4.8 on Tue Apr 19 12:52:04 2011
*filter
:INPUT ACCEPT [37536:5302365]
:FORWARD ACCEPT [5367:1221790]
:OUTPUT ACCEPT [30601:3514407]
-A FORWARD -m state --state RELATED,ESTABLISHED -m physdev --physdev-out vif8.0 -j ACCEPT 
-A FORWARD -p udp -m physdev --physdev-in vif8.0 -m udp --sport 68 --dport 67 -j ACCEPT 
-A FORWARD -m state --state RELATED,ESTABLISHED -m physdev --physdev-out vif8.0 -j ACCEPT 
-A FORWARD -s 10.10.0.101/32 -m physdev --physdev-in vif8.0 -j ACCEPT 
COMMIT
# Completed on Tue Apr 19 12:52:04 2011
# Generated by iptables-save v1.4.8 on Tue Apr 19 12:52:04 2011
*nat
:PREROUTING ACCEPT [1441667:468129452]
:POSTROUTING ACCEPT [608454:36641119]
:OUTPUT ACCEPT [608448:36640127]
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -s 10.10.0.0/24 -o eth0 -j SNAT --to-source 192.168.3.9 
COMMIT
# Completed on Tue Apr 19 12:52:04 2011

As for the keep alive config:

# test1 config
vrrp_script chk_haproxy {               # Requires keepalived-1.1.13
    script "killall -0 haproxy"     # cheaper than pidof
    interval 2                      # check every 2 seconds
    weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
    priority 101                    # 101 on master, 100 on backup
    virtual_ipaddress {
        10.10.0.200
    }
    track_script {
        chk_haproxy
    }
}

and for test2:

vrrp_script chk_haproxy {               # Requires keepalived-1.1.13
    script "killall -0 haproxy"     # cheaper than pidof
    interval 2                      # check every 2 seconds
    weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
    priority 100                    # 101 on master, 100 on backup
    virtual_ipaddress {
        10.10.0.200
    }
    track_script {
        chk_haproxy
    }
}

Each host can "arping" each other:

# on test1
sudo arping test2
ARPING 10.10.0.101 from 10.10.0.100 eth0
Unicast reply from 10.10.0.101 [FE:FF:FF:FF:FF:FF]  751.879ms
Unicast reply from 10.10.0.101 [FE:FF:FF:FF:FF:FF]  0.626ms
...

# on test2
sudo arping test1
ARPING 10.10.0.100 from 10.10.0.101 eth0
Unicast reply from 10.10.0.100 [FE:FF:FF:FF:FF:FF]  105.399ms
Unicast reply from 10.10.0.100 [FE:FF:FF:FF:FF:FF]  0.655ms

[EDIT] If I remove the track_script line from keepalived config, and restart, I get the following log:

Apr 19 13:35:06 test1 Keepalived: Terminating on signal
Apr 19 13:35:06 test1 Keepalived: Stopping Keepalived v1.1.20 (08/18,2010)
Apr 19 13:35:06 test1 Keepalived_vrrp: Terminating VRRP child process on signal
Apr 19 13:35:06 test1 Keepalived_healthcheckers: Terminating Healthchecker child process on signal
Apr 19 13:35:07 test1 Keepalived: Starting Keepalived v1.1.20 (08/18,2010)
Apr 19 13:35:07 test1 Keepalived: Starting Healthcheck child process, pid=4848
Apr 19 13:35:07 test1 Keepalived: Starting VRRP child process, pid=4849
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Initializing ipvs 2.6
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering Kernel netlink reflector
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering Kernel netlink command channel
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 19 13:35:07 test1 Keepalived_vrrp: Initializing ipvs 2.6
Apr 19 13:35:07 test1 Keepalived_healthcheckers: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Registering Kernel netlink reflector
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Registering Kernel netlink command channel
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:35:07 test1 Keepalived_vrrp: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:35:07 test1 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Configuration is using : 3103 Bytes
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Apr 19 13:35:07 test1 Keepalived_vrrp: Configuration is using : 31958 Bytes
Apr 19 13:35:07 test1 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Apr 19 13:35:08 test1 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Apr 19 13:35:09 test1 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

and:

Apr 19 13:34:43 test2 Keepalived: Terminating on signal
Apr 19 13:34:43 test2 Keepalived: Stopping Keepalived v1.1.20 (08/18,2010)
Apr 19 13:34:43 test2 Keepalived_vrrp: Terminating VRRP child process on signal
Apr 19 13:34:43 test2 Keepalived_healthcheckers: Terminating Healthchecker child process on signal
Apr 19 13:34:44 test2 Keepalived: Starting Keepalived v1.1.20 (08/18,2010)
Apr 19 13:34:44 test2 Keepalived: Starting Healthcheck child process, pid=3811
Apr 19 13:34:44 test2 Keepalived: Starting VRRP child process, pid=3812
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Initializing ipvs 2.6
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering Kernel netlink reflector
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering Kernel netlink command channel
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 19 13:34:44 test2 Keepalived_vrrp: Initializing ipvs 2.6
Apr 19 13:34:44 test2 Keepalived_healthcheckers: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Registering Kernel netlink reflector
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Registering Kernel netlink command channel
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:34:44 test2 Keepalived_vrrp: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Configuration is using : 3103 Bytes
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Apr 19 13:34:44 test2 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:34:44 test2 Keepalived_vrrp: Configuration is using : 31958 Bytes
Apr 19 13:34:44 test2 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Apr 19 13:34:45 test2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Apr 19 13:34:46 test2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

The 'state MASTER' will confuse matters as they will both initially transition to MASTER and assume the IP (as per your logs) - you only want MASTER on one of them and BACKUP on the other (so one starts off passive).

However, since they both presumably stay as MASTER it would suggest they can't see each other's VRRP announcements (if they could one would step down after seeing a higher priority announced).

Check you can see multicast traffic from both hosts (tcpdump multicast).

Edit: crap, just realised this is quite old - might be useful for anyone else using keepalived though.