Why is a ping response being sent to the wrong gateway?

In a previous question, I was trying to determine why my OpenVPN clients could not ping the server LAN, even though the server LAN could ping the clients.

Having investigated this further, I determined that, at least in the case of one of the servers, the problem results from a decision by the kernel to forward an Ethernet frame containing the ping reply in the direction of a MAC address that doesn't know how to route the packet.

So, for example:

10.11.11.7 de:ad:be:7f:45:72 
10.11.11.1 00:10:db:ff:70:01
10.11.11.2 de:ad:be:3b:24:48 

Pings from 10.11.11.7 to 10.8.0.10 work. Ping requests from 10.8.0.10 to 10.11.11.7 arrive as expected, but replies never reach 10.8.0.10 apparently because they are routed in the direction of 10.11.11.1 instead of 10.11.11.2 which contains the VPN server that can route to 10.8.0.0/24.

For example:

When I try to ping 10.8.0.10 from 10.11.11.7, the request leaves on the interface containing 10.11.11.2 which contains the VPN gateway that can reach 10.8.0.10.

01:46:39.973670 de:ad:be:7f:45:72 > de:ad:be:3b:24:48, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 10.11.11.7 > 10.8.0.10: ICMP echo request, id 49247, seq 6, length 64
0x0000:  4500 0054 0000 4000 4001 1b86 0a0b 0b07  E..T..@.@.......
0x0010:  0a08 000a 0800 37a4 c05f 0006 7ff8 5f4f  ......7.._...._O
0x0020:  0000 0000 53db 0e00 0000 0000 1011 1213  ....S...........
0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0050:  3435 3637                                4567

The expected response arrives via the reverse path...

01:46:40.145368 de:ad:be:3b:24:48 > de:ad:be:7f:45:72, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl  63, id 53200, offset 0, flags [none], proto: ICMP (1), length: 84) 10.8.0.10 > 10.11.11.7: ICMP echo reply, id 49247, seq 6, length 64
0x0000:  4500 0054 cfd0 0000 3f01 8cb5 0a08 000a  E..T....?.......
0x0010:  0a0b 0b07 0000 3fa4 c05f 0006 7ff8 5f4f  ......?.._...._O
0x0020:  0000 0000 53db 0e00 0000 0000 1011 1213  ....S...........
0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0050:  3435 3637                                4567

On the other hand, when 10.8.0.10 pings 10.11.11.7, the ping request is received on the expected interface:

01:46:11.734359 de:ad:be:3b:24:48 > de:ad:be:7f:45:72, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl  63, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 10.8.0.10 > 10.11.11.7: ICMP echo request, id 15635, seq 74, length 64
0x0000:  4500 0054 0000 4000 3f01 1c86 0a08 000a  E..T..@.?.......
0x0010:  0a0b 0b07 0800 c1ff 3d13 004a 65f8 5f4f  ........=..Je._O
0x0020:  0000 0000 7088 0400 0000 0000 1011 1213  ....p...........
0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0050:  3435 3637                                4567

but it leaves in the direction of 10.11.11.1, instead of 10.11.11.2:

01:46:11.734383 de:ad:be:7f:45:72 > 00:10:db:ff:70:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl  64, id 41757, offset 0, flags [none], proto: ICMP (1), length: 84) 10.11.11.7 > 10.8.0.10: ICMP echo reply, id 15635, seq 74, length 64
0x0000:  4500 0054 a31d 0000 4001 b868 0a0b 0b07  [email protected]....
0x0010:  0a08 000a 0000 c9ff 3d13 004a 65f8 5f4f  ........=..Je._O
0x0020:  0000 0000 7088 0400 0000 0000 1011 1213  ....p...........
0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0050:  3435 3637                                4567

This is unexpected, because the route table on 10.11.11.7 is configured as follows:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.11.11.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.8.0.0        10.11.11.2     255.255.255.0   UG    0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
0.0.0.0         10.11.11.2     0.0.0.0         UG    0      0        0 eth0

So, my question is: why is the kernel sending the ping response in the direction of 10.11.11.1, even though the gateway is defined as 10.11.11.2?

Update:

By polluting the arp cache in 10.11.11.7 with a mac address for 10.11.11.1, that actually points at 10.11.11.2 e.g.:

sudo /sbin/arp -s 10.11.11.1 de:ad:be:3b:24:48

I was able to get ping from 10.8.0.10 to 10.11.11.7 working as expected.

Obviously, this was just for purposes of demonstration. Why is my kernel choosing the wrong destination MAC address in the first place?

Update 2:

According to lsmod, the network driver is probably:

virtio_net             48449  0 

which indicates, perhaps, that the virtual machine is running under KVM.

Update 3:

This question was answered with ptman's suggestion to consider policy and source based routing in his answer to another question of mine.

Thank you, ptman!


Solution 1:

This question was answered with ptman's suggestion to consider policy and source based routing in his answer to my question.

In a nutshell, the problem was caused by an adapter specific default static route that was being interpreted before any of the rules in the main routing table (which is the one displayed with /sbin/route).

This default route was intercepting and diverting packets destined for 10.8.0.0/24 and directing them towards 10.11.11.1 instead of the intended hop of 10.11.11.2. As a result the rule that should have diverted these packets to 10.11.11.2 was never being exercised.

The confusion arose, in part, because /sbin/route doesn't show the adapter specific static routes. Be aware of such routes, and get familiar with: /sbin/ip rule and /sbin/ip route list table all