iptables/nftables: how to exclude all forwarded traffic from connection tracking on a router?

Solution 1:

For a generic ruleset, one can ask nftables to do a route lookup in advance using the fib expression instead of waiting for the routing stack to do it. This allows to involve the (future) output interface despite not existing yet (routing decision didn't happen), at the cost of an extra lookup. Then if the results tells the packet will be routed, prevent tracking to happen using a notrack statement.

FIB EXPRESSIONS

fib {saddr | daddr | mark | iif | oif} [. ...] {oif | oifname | type}

A fib expression queries the fib (forwarding information base) to obtain information such as the output interface index a particular address would use. The input is a tuple of elements that is used as input to the fib lookup functions.

NOTRACK STATEMENT

The notrack statement allows to disable connection tracking for certain packets.

notrack

Note that for this statement to be effective, it has to be applied to packets before a conntrack lookup happens. Therefore, it needs to sit in a chain with either prerouting or output hook and a hook priority of -300 or less.

So one should do a "simple" route check from prerouting, using only the destination address as selector and check for the existence of an output interface (non-routable packets or packets intended for the host won't resolve any). There's an exception for the lo (loopback) interface to keep it tracked: while it represents local traffic, a packet sent (through the output path) from host to itself comes back through prerouting path and does have an output interface of lo too. As the outgoing packet already created a conntrack entry, better keep this consistent.

nft add table ip stateless
nft add chain ip stateless prerouting '{ type filter hook prerouting priority -310; policy accept; }'
nft add rule ip stateless prerouting iif != lo fib daddr oif exists notrack

Replacing the ip family with the inet combo family should extend the same generic behavior to IPv4+IPv6.

To be more specific one could specify the future output interface with fib daddr oif eth1 for example, which is more or less the equivalent of oif eth1, but also available in prerouting.

Of course if the topology is known in advance it's possible to avoid a FIB lookup by using one or a few rules based on address tests since the routes are then known in advance by the administrator. Benchmarking the results might be needed to know if this is more interesting than keeping a generic method.

For example, with OP's provided information, replacing the previous rule with:

nft add rule ip stateless prerouting 'ip daddr != { 192.168.1.1, 192.168.2.1, 127.0.0.0/8 } notrack'

should have a near-equivalent effect. 127.0.0.0/8 is present for the same reasons as above with the lo interface.

Handling of broadcast (like 192.168.1.255 received on eth0) and multicast (like link-local 224.0.0.1 received on an interface) might not work the same in both methods nor as expected and would possibly require additional rules for specific needs, especially with the 2nd method. As tracking broadcast and multicast is rarely useful, because a reply source won't (and can't) be the original broadcast or multicast address destination so the conntrack entry will never "see" bidirectional traffic, it usually doesn't matter much for stateful rules.


Notes

  • This will usually not be compatible with stateful NAT.

    My understanding is that DNAT toward a remote host will get its reply traffic not de-NATed and fail, and that forwarded SNAT won't trigger since there was no conntrack entry created. Rarely used SNAT in input path should be fine, and a combo of DNAT+SNAT (using a local address source) might also work since then in both original and reply directions there's a local destination involved so a conntrack entry should then always be correctly created or looked up.

  • standard ruleset

    Actual rules using iptables or nftables (in its own different table) can then be done as usual, including stateful rules for the host itself. As routed traffic won't create conntrack entries, rules if any involving such traffic should stick to be only stateless and not use any ct expression because it would never match.

  • verifying behavior

    One can check the overall behavior even without proper firewall rules by:

    • using a dummy ct rule to be sure the conntrack facility gets registered in the current network namespace.

      nft add table ip mytable
      nft add chain ip mytable mychain '{ type filter hook prerouting priority -150; policy accept; }'
      nft add rule ip mytable mychain ct state new
      
    • use the conntrack tool to follow events:

      conntrack -E
      
    • generate traffic from remote

      NEW conntrack entries will be then created for traffic to be received by the router, but not for routed traffic.