Why is network stack ignoring icmp replies from non-default interface?
I have following situation:
- eth0 - default gateway ( ip: 172.28.183.100, gw: 172.28.183.1 )
- eth0 - secondary network connection ( ip: 172.28.171.2, gw: 172.28.171.2).
routing looks like this:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
172.28.183.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
172.28.171.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
172.28.173.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
78.46.78.0 172.28.171.1 255.255.255.0 UG 0 0 0 eth2
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth0
0.0.0.0 172.28.183.1 0.0.0.0 UG 100 0 0 eth0
As you can see there is special route for 78.46.78.0/24 - this traffic should go by the secondary net eth2.
Which works. I can do any kind of tcp connections to machines in 78.46.78.0/24.
But, when I try to mtr them, I got weird result:
root@blob:~# mtr --report --report-cycles=5 78.46.78.198
HOST: blob Loss% Snt Last Avg Best Wrst StDev
1. 172.28.171.1 0.0% 5 0.6 0.6 0.5 0.6 0.0
2. ??? 100.0 5 0.0 0.0 0.0 0.0 0.0
In tcpdump output I see returned replies of time-to-live exceeded:
10:16:28.158888 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 59520, length 44
10:16:28.159363 IP 172.28.171.1 > 172.28.171.2: ICMP time exceeded in-transit, length 72
10:16:28.259153 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 59776, length 44
10:16:28.359546 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 60032, length 44
10:16:28.408129 IP 10.9.208.1 > 172.28.171.2: ICMP time exceeded in-transit, length 36
10:16:28.428193 IP 10.9.208.2 > 172.28.171.2: ICMP time exceeded in-transit, length 36
10:16:28.459953 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 60288, length 44
10:16:28.560260 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 60544, length 44
10:16:28.618138 IP 10.9.213.6 > 172.28.171.2: ICMP time exceeded in-transit, length 36
10:16:28.660678 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 60800, length 44
10:16:28.708130 IP 10.9.212.253 > 172.28.171.2: ICMP time exceeded in-transit, length 36
10:16:28.730193 IP 213.158.195.13 > 172.28.171.2: ICMP time exceeded in-transit, length 36
10:16:28.761086 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 61056, length 44
10:16:28.861380 IP 172.28.171.2 > 78.46.78.198: ICMP echo request, id 2092, seq 61312, length 44
10:16:28.938167 IP 213.248.89.153 > 172.28.171.2: ICMP time exceeded in-transit, length 36
but, with strace on mtr i see that these ICMP replies are not delivered to mtr!
I think that the reason might be that the source ip of icmp response comes from "wrong" interface" - i.e. ICMP reply comes from (for example) 10.9.212.253 (some intermediary router), but this ip should be routed via eth0, while it comes to eth2.
Is it sensible reason? What can I do about it to make mtr work even to my "special" network?
iptables are set using:
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -i eth1 -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -i eth1 -j ACCEPT
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -o eth2 -j MASQUERADE
iptables -A INPUT -j LOG --log-prefix 'IPTABLES: '
iptables -A FORWARD -j LOG --log-prefix 'IPTABLES: '
But I don't see any icmp-related packages with kern.log.
Thanks to Rafał Ramocki - solution is simple - you have to turn off rp_filter-ing on eth2 interface:
echo 0 > /proc/sys/net/ipv4/conf/eth2/rp_filter
From kernel docs:
rp_filter
---------
Integer value determines if a source validation should be made. 1 means yes, 0
means no. Disabled by default, but local/broadcast address spoofing is always
on.
If you set this to 1 on a router that is the only connection for a network to
the net, it will prevent spoofing attacks against your internal networks
(external addresses can still be spoofed), without the need for additional
firewall rules.
While nice for preventing spoofing attacks (at least some), it definitely kills some functionality if you have more internet connections.