Advanced routing with firewall marks and `rp_filter`

Today is routing day. You should now how to add an IP route, basic use of ip rule, and good understanding of iptables to read this.

This is cross posted (in French) on LinuxFr.

Problem

I have an ADSL modem (Freebox) in bridge mode, and an optic fibre (Livebox) in DMZ mode. All but selected traffic goes via ADSL, and the other one goes via fibre channel. This is my setup, and in the opposite case, the problem will still be the same.

To ease the issue, I'll talk about traffic going through the router, running Debian GNU/Linux, and not about the traffic generated by this router.

My ADSL address is 82.236.xxx.xxx and fibre is 90.76.xxx.xxx.

What should work

We take for granted that the mangle table is empty.

> ip route show table livebox
default via 192.168.1.1 dev eth_livebox src 192.168.1.253 
82.236.xxx.0/24 dev eth_adsl scope link src 82.236.xxx.xxx 
192.168.0.0/24 dev bridge_local scope link src 192.168.0.253 
192.168.1.0/24 dev eth_livebox scope link src 192.168.1.253 

iptables -t mangle -I PREROUTING --destination 23.23.114.123 -j MARK --set-mark 1
ip rule add from all fwmark 0x1 lookup livebox

This is not working. It means that when I do

curl http://api.ipify.org/ --resolve api.ipify.org:80:23.23.114.123

from a LAN client, nothing is returned.

What works

The livebox table is unchanged. However, after flushing the mangle table, we fill it with this:

iptables -t mangle -I PREROUTING --source 23.23.114.123 -j TOS --set-tos 0x10
iptables -t mangle -I PREROUTING --destination 23.23.114.123 -j TOS --set-tos 0x10
ip rule add from all tos 0x10 lookup livebox

And then :

> curl http://api.ipify.org/ --resolve api.ipify.org:80:23.23.114.123
90.76.xxx.xxx

Why?

Between the two snippets, I changed two things:

  1. I used the TOS IP header instead of firewall marked, managed internally by the kernel and its modules.
  2. I marked the returning packets.

I lied (by omission): rp_filter

I forgot to say that on all interfaces, the rp_filter is set to 1. According to the kernel documentation, the value 1 stands for a strict reverse path checking as defined in the RFC 3704.

To summarize, when a packet comes into an interface, the kernel swap both source and destination IP address fields, and try to route this new fake packet. If the chosen route goes out through the interface where the packet comes from, the check is ok. Otherwise, the packet is dropped.

So, according to What should work, since the incoming packet is not marked with 1, the strict reverse path checking fails. Indeed, the returning packet comes through eth_livebox, but without mark, it is routed according to the main table, which says to go through eth_adsl. It is a failure. This is the reason of the change no. 2.

Why TOS and not MARK ?

Yes, of course, I tried -j MARK on returning packets. And this is not working. After some hours of digging old mailing-lists messages, I found this one:

OK, looking at fib_validate_source(), it looks like how rp_filter works is just that the kernel takes the packet, reverses src & dst addrs and interfaces, and tries to do a routing lookup. It totally ignores marking when building the routing key, but weirdly enough, it does check the TOS.

OOOOOK. So I read some documentation about TOS, and since I'm still looking for a solution, I do it quick and dirty. It works. This is the reason of the change no. 2.

Can it be better?

I let you check the code of fib_validate_source(). Honestly, it's too heavy for me.

But in my opinion, the result is inconsistent. I know that TOS is inside the IP header, and that firewall marks are specific to host internals. And on the other side ip rule has a syntax to look for a route either on the TOS header value or on the firewall mark value with fwmark.

I don't know what I really should do for now, and here are my conclusions, non exclusive.

Give up rp_filter on public interfaces

The goal of rp_filter is to avoid DDoS, but also to filter rogue clients that forge packets directly within my own managed network. It is a bit like SPF, it protects other actors.

On my public interfaces, I obviously have a routing entry like default via IP, so anyway, the rp_filter will conclude that the packet can be answered. Indeed, if a packet arrives until my router, well it's because my ISP let it through. And they managed to route it.

So I could give up and set rp_filter to 0 on all those interfaces (warning: the maximal value between net.ipv4.conf.eth_livebox.rp_filter et net.ipv4.conf.all.rp_filter is applied).

EDIT: User rpfilter from iptables

Someone on LinuxFR brought my attention to this: the rp_filter control is deprecated, or at least in an abandoned state. There is indeed a rpfilter module for iptables, which is the future of it. As an example, taken from here:

iptables -A PREROUTING -t raw -m rpfilter --invert -j DROP
ip6tables -A PREROUTING -t raw -m rpfilter --invert -j DROP

It is well integrated in the firewall, it works, and returning packets don't even need to be marked, since they are recognized by their state.

Report this "bug" to kernel developers

It seems very inconsistent to me, and moreover very badly documented. On one hand, ip rule let you make rules that work for incoming packets, but not for returning ones: misbehavior.

But here I am: I don't have the time to get skilled enough to read this code, understand it, and try to fix it. And I don't even know if there is a good reason for that, like the fact that firewall marks are maybe not available when calling fib_validate_source.

But if someone here tells me that it could be reported to someone who cares, or explains, and maybe fix and improve, I will gladly do it.

EDIT: Maybe the documentation of the rp_filter parameter should be updated…