iptables/nftables: how to exclude all forwarded traffic from connection tracking on a router?
Solution 1:
For a generic ruleset, one can ask nftables to do a route lookup in advance using the fib
expression instead of waiting for the routing stack to do it. This allows to involve the (future) output interface despite not existing yet (routing decision didn't happen), at the cost of an extra lookup. Then if the results tells the packet will be routed, prevent tracking to happen using a notrack
statement.
FIB EXPRESSIONS
fib {saddr | daddr | mark | iif | oif} [. ...] {oif | oifname | type}
A fib expression queries the fib (forwarding information base) to obtain information such as the output interface index a particular address would use. The input is a tuple of elements that is used as input to the fib lookup functions.
NOTRACK STATEMENT
The notrack statement allows to disable connection tracking for certain packets.
notrack
Note that for this statement to be effective, it has to be applied to packets before a conntrack lookup happens. Therefore, it needs to sit in a chain with either prerouting or output hook and a hook priority of -300 or less.
So one should do a "simple" route check from prerouting, using only the destination address as selector and check for the existence of an output interface (non-routable packets or packets intended for the host won't resolve any). There's an exception for the lo (loopback) interface to keep it tracked: while it represents local traffic, a packet sent (through the output path) from host to itself comes back through prerouting path and does have an output interface of lo too. As the outgoing packet already created a conntrack entry, better keep this consistent.
nft add table ip stateless
nft add chain ip stateless prerouting '{ type filter hook prerouting priority -310; policy accept; }'
nft add rule ip stateless prerouting iif != lo fib daddr oif exists notrack
Replacing the ip
family with the inet
combo family should extend the same generic behavior to IPv4+IPv6.
To be more specific one could specify the future output interface with fib daddr oif eth1
for example, which is more or less the equivalent of oif eth1
, but also available in prerouting.
Of course if the topology is known in advance it's possible to avoid a FIB lookup by using one or a few rules based on address tests since the routes are then known in advance by the administrator. Benchmarking the results might be needed to know if this is more interesting than keeping a generic method.
For example, with OP's provided information, replacing the previous rule with:
nft add rule ip stateless prerouting 'ip daddr != { 192.168.1.1, 192.168.2.1, 127.0.0.0/8 } notrack'
should have a near-equivalent effect. 127.0.0.0/8 is present for the same reasons as above with the lo interface.
Handling of broadcast (like 192.168.1.255 received on eth0) and multicast (like link-local 224.0.0.1 received on an interface) might not work the same in both methods nor as expected and would possibly require additional rules for specific needs, especially with the 2nd method. As tracking broadcast and multicast is rarely useful, because a reply source won't (and can't) be the original broadcast or multicast address destination so the conntrack entry will never "see" bidirectional traffic, it usually doesn't matter much for stateful rules.
Notes
-
This will usually not be compatible with stateful NAT.
My understanding is that DNAT toward a remote host will get its reply traffic not de-NATed and fail, and that forwarded SNAT won't trigger since there was no conntrack entry created. Rarely used SNAT in input path should be fine, and a combo of DNAT+SNAT (using a local address source) might also work since then in both original and reply directions there's a local destination involved so a conntrack entry should then always be correctly created or looked up.
-
standard ruleset
Actual rules using iptables or nftables (in its own different table) can then be done as usual, including stateful rules for the host itself. As routed traffic won't create conntrack entries, rules if any involving such traffic should stick to be only stateless and not use any
ct
expression because it would never match. -
verifying behavior
One can check the overall behavior even without proper firewall rules by:
-
using a dummy
ct
rule to be sure the conntrack facility gets registered in the current network namespace.nft add table ip mytable nft add chain ip mytable mychain '{ type filter hook prerouting priority -150; policy accept; }' nft add rule ip mytable mychain ct state new
-
use the
conntrack
tool to follow events:conntrack -E
-
generate traffic from remote
NEW conntrack entries will be then created for traffic to be received by the router, but not for routed traffic.
-