Advanced routing with firewall marks and `rp_filter`
Today is routing day. You should now how to add an IP route, basic use of ip rule
, and good understanding of iptables
to read this.
This is cross posted (in French) on LinuxFr.
Problem
I have an ADSL modem (Freebox) in bridge mode, and an optic fibre (Livebox) in DMZ mode. All but selected traffic goes via ADSL, and the other one goes via fibre channel. This is my setup, and in the opposite case, the problem will still be the same.
To ease the issue, I'll talk about traffic going through the router, running Debian GNU/Linux, and not about the traffic generated by this router.
My ADSL address is 82.236.xxx.xxx and fibre is 90.76.xxx.xxx.
What should work
We take for granted that the mangle
table is empty.
> ip route show table livebox
default via 192.168.1.1 dev eth_livebox src 192.168.1.253
82.236.xxx.0/24 dev eth_adsl scope link src 82.236.xxx.xxx
192.168.0.0/24 dev bridge_local scope link src 192.168.0.253
192.168.1.0/24 dev eth_livebox scope link src 192.168.1.253
iptables -t mangle -I PREROUTING --destination 23.23.114.123 -j MARK --set-mark 1
ip rule add from all fwmark 0x1 lookup livebox
This is not working. It means that when I do
curl http://api.ipify.org/ --resolve api.ipify.org:80:23.23.114.123
from a LAN client, nothing is returned.
What works
The livebox
table is unchanged.
However, after flushing the mangle
table, we fill it with this:
iptables -t mangle -I PREROUTING --source 23.23.114.123 -j TOS --set-tos 0x10
iptables -t mangle -I PREROUTING --destination 23.23.114.123 -j TOS --set-tos 0x10
ip rule add from all tos 0x10 lookup livebox
And then :
> curl http://api.ipify.org/ --resolve api.ipify.org:80:23.23.114.123
90.76.xxx.xxx
Why?
Between the two snippets, I changed two things:
- I used the TOS IP header instead of firewall marked, managed internally by the kernel and its modules.
- I marked the returning packets.
I lied (by omission): rp_filter
I forgot to say that on all interfaces, the rp_filter
is set to 1.
According to the kernel documentation, the value 1 stands for a strict reverse path checking as defined in the RFC 3704.
To summarize, when a packet comes into an interface, the kernel swap both source
and destination
IP address fields, and try to route this new fake packet. If the chosen route goes out through the interface where the packet comes from, the check is ok. Otherwise, the packet is dropped.
So, according to What should work
, since the incoming packet is not marked with 1
, the strict reverse path checking fails. Indeed, the returning packet comes through eth_livebox
, but without mark, it is routed according to the main
table, which says to go through eth_adsl
. It is a failure. This is the reason of the change no. 2.
Why TOS and not MARK ?
Yes, of course, I tried -j MARK
on returning packets. And this is not working. After some hours of digging old mailing-lists messages, I found this one:
OK, looking at fib_validate_source(), it looks like how rp_filter works is just that the kernel takes the packet, reverses src & dst addrs and interfaces, and tries to do a routing lookup. It totally ignores marking when building the routing key, but weirdly enough, it does check the TOS.
OOOOOK. So I read some documentation about TOS, and since I'm still looking for a solution, I do it quick and dirty. It works. This is the reason of the change no. 2.
Can it be better?
I let you check the code of fib_validate_source()
. Honestly, it's too heavy for me.
But in my opinion, the result is inconsistent. I know that TOS
is inside the IP
header, and that firewall marks are specific to host internals. And on the other side ip rule
has a syntax to look for a route either on the TOS
header value or on the firewall mark value with fwmark
.
I don't know what I really should do for now, and here are my conclusions, non exclusive.
Give up rp_filter
on public interfaces
The goal of rp_filter
is to avoid DDoS, but also to filter rogue clients that forge packets directly within my own managed network. It is a bit like SPF, it protects other actors.
On my public interfaces, I obviously have a routing entry like default via IP
, so anyway, the rp_filter
will conclude that the packet can be answered. Indeed, if a packet arrives until my router, well it's because my ISP let it through. And they managed to route it.
So I could give up and set rp_filter
to 0 on all those interfaces (warning: the maximal value between net.ipv4.conf.eth_livebox.rp_filter et net.ipv4.conf.all.rp_filter is applied).
EDIT: User rpfilter
from iptables
Someone on LinuxFR brought my attention to this: the rp_filter
control is deprecated, or at least in an abandoned state. There is indeed a rpfilter
module for iptables
, which is the future of it. As an example, taken from here:
iptables -A PREROUTING -t raw -m rpfilter --invert -j DROP
ip6tables -A PREROUTING -t raw -m rpfilter --invert -j DROP
It is well integrated in the firewall, it works, and returning packets don't even need to be marked, since they are recognized by their state.
Report this "bug" to kernel developers
It seems very inconsistent to me, and moreover very badly documented. On one hand, ip rule
let you make rules that work for incoming packets, but not for returning ones: misbehavior.
But here I am: I don't have the time to get skilled enough to read this code, understand it, and try to fix it.
And I don't even know if there is a good reason for that, like the fact that firewall marks are maybe not available when calling fib_validate_source
.
But if someone here tells me that it could be reported to someone who cares, or explains, and maybe fix and improve, I will gladly do it.
EDIT: Maybe the documentation of the rp_filter
parameter should be updated…