Nftables DNAT doesn't seem to be working

I am trying to setup a DNAT on my new centos 8 using nftables. This utility (and centos 8) is new to me, I've been using iptables (centos through 6) for ages.

My assumption is that I did not setup something properly for DNAT to kick in, however I may just not be using the tools properly. Or both.

Anyhow, in case it matters, here's a previous question of mine on some routing issues on the same box: Multiple internet connections, incoming packets on wrong NIC port (inbound routing issue?) (problem was an ARP flux, solved).

Here's below a sketch of my current setup and blue line marks what I want to happen (the expected "path")

enter image description here

Basically, packets come in from the internet, go over interface 2 (ens2), are DNATed thru the local interface (ens5, local IP 192.168.1.10) to 192.168.1.2. (Once this is working, the same will be setup for ens 3 and 4, going to a couple of different VMs on the same LAN)

I have verified that packets come in on the correct interface (the expected nft log triggers), however, conntrack -E does not show anything.

Also, the iptables logging on the centos 6 box (the actual target, 192.168.1.2) does not show anything (the same logging put in place ages ago was showing expected output the last time I checked, a few months ago, so that box should in theory be ok)

Here's my nftables script as-is right now, with IP's/IF's translated to match sketch.

table ip nat {
    chain PREROUTING {
            type nat hook prerouting priority -100; policy accept;
            iif "ens2" goto PREROUTING_RDS2
            iif "ens3" goto PREROUTING_RDS3
    }

    chain PREROUTING_RDS2 {
            tcp dport { http, https } log prefix "RDS2_dnat-3 "
            tcp dport { http, https } dnat to IP_6
    }

    chain PREROUTING_RDS3 {
            tcp dport { http, https } log prefix "RDS3_dnat-3 "
            tcp dport { http, https } dnat to IP_6
    }
}

table inet filter {
    chain INPUT {
            type filter hook input priority 0; policy drop;
            #
            iif "lo" accept
            #
            # allow ping
            ip protocol icmp icmp type echo-request limit rate 1/second log prefix "PING "
            ip protocol icmp icmp type echo-request limit rate 1/second accept
            # following is required and must be BEFORE the ct state established otherwise the ping flooding will not be stopped
            ip protocol icmp drop
            #
            ct state established,related accept
            ct status dnat accept
            #
            iifname "ens5" goto INPUT_LOCAL
            #
            # now we drop the rest
            ct state invalid log prefix "INPUT_STATE_INVALID_DROP: "
            ct state invalid drop
            log prefix "INPUT_FINAL_REJECT: "
            reject with icmpx type admin-prohibited
    }

    chain FILTER {
            type filter hook forward priority 50; policy drop;
            iif "ens2" goto FILTER_RDS2
            iif "ens3" goto FILTER_RDS3
    }

    chain INPUT_LOCAL {
            tcp dport ssh goto INPUT_LOCAL_ssh
    }

    chain INPUT_LOCAL_ssh {
            ip saddr IP_MY_PC accept
    }

    chain FILTER_RDS2 {
            oifname "ens5" ip daddr IP_6 tcp dport { http, https } accept
    }

    chain FILTER_RDS3 {
            oifname "ens5" ip daddr IP_6 tcp dport { http, https } accept
    }
}

Thank you in advance.


Solution 1:

Actually, this question is difficult to answer without taking a good look at the previous Q/A solving the initial setup for centos8. The solution becomes very complex. Considering the kind of configuration that must be put in place for this, it's probably not worth having one IP per interface, with multiple interfaces on the same LAN, rather than all IPs on the same interface, especially considering it's in a virtual environment: there won't be any speed-up. Any change to configuration, will have to be reflected all along the commands below, so managing this correctly will be difficult.


centos8 router

Since to solve the multiple-interfaces-in-same-LAN problem there are additional routing tables, now that in this Q/A centos8 is acting as a router, more route entries must be duplicated from the main table to the additional routing tables:

# ip route add 192.168.1.0/24 dev ens5 table 1001 src 192.168.1.10 
# ip route add 192.168.1.0/24 dev ens5 table 1002 src 192.168.1.10 
# ip route add 192.168.1.0/24 dev ens5 table 1003 src 192.168.1.10 
# ip route add 192.168.1.0/24 dev ens5 table 1004 src 192.168.1.10 

else any packet received on ens1, ens2, ens3 or ens4 and dnated through ens5 will fail reverse path filter since there's no route through ens5 on those tables.

Of course that's not enough: there is no information in the reply packets (eg: coming back from centos6) about what interface was used and should be reused the other way around. So this information has to be memorized per-flow, using netfilter's conntrack. In nftables rules, delete the whole ip nat table:

# nft delete table ip nat

and replace it with this new table ip markandnat:

# nft -f - << 'EOF'
table ip markandnat {
        map iif2mark {
                type iface_index : mark;
                elements = {
                        ens1 : 101,
                        ens2 : 102,
                        ens3 : 103,
                        ens4 : 104
                }
        }

        map mark2daddr {
                type mark : ipv4_addr;
                elements = {
                        102 : 192.168.1.2,
                        103 : 192.168.1.2, # same IP, as per OP's config
                        104 : 192.168.1.4  # some other VM
                }
        }
        chain premark {
                type filter hook prerouting priority -150; policy accept;
                meta mark set ct mark meta mark != 0 return
                meta mark set iif map @iif2mark meta mark != 0 ct mark set meta mark
        }

        chain prenat {
                type nat hook prerouting priority -100; policy accept;
                tcp dport { http, https } dnat to meta mark map @mark2daddr
        }
}
EOF

This will map interface => mark => dnat destination, while saving the mark as conntrack's mark (see the link at the end about connmark usage). Now this mark will be available and used by the routing stack by adding the rules below, to point to the same additional routing tables:

# ip rule add pref 11001 fwmark 101 table 1001
# ip rule add pref 11002 fwmark 102 table 1002
# ip rule add pref 11003 fwmark 103 table 1003
# ip rule add pref 11004 fwmark 104 table 1004

but there's still a missing part: again about reverse path filter. When marks are in use, reverse path filter isn't rechecking using the new routes altered by the marks and usually fails check. Actually there's an undocumented feature, added in kernel 2.6.33/2.6.32.8 in 2009/2010, which happens to solve this problem, without the need to use loose reverse path mode: src_valid_mark.

# sysctl -w net.ipv4.conf.ens1.src_valid_mark=1
# sysctl -w net.ipv4.conf.ens2.src_valid_mark=1
# sysctl -w net.ipv4.conf.ens3.src_valid_mark=1
# sysctl -w net.ipv4.conf.ens4.src_valid_mark=1

centos6 server

If you want to use an alternate gateway temporarily, even if that's again adding complexity and probably unforeseen subtle side effects, it's possible, again by using marks. As it's CentOS 6, nftables is not available, so iptables will be used.

I'll consider that the centos6 VM has IP 192.168.1.2/24 on (unique) interface eth0, and default gw 192.168.1.1. Let's add a new routing table and rule for the alternate gateway 192.168.1.10:

# ip route add table 10 default via 192.168.1.10
# ip rule add fwmark 10 lookup 10

Put the iptables rules (here only the mangle table is needed):

# iptables-restore << 'EOF'
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A PREROUTING -j CONNMARK --restore-mark
-A PREROUTING -m mark ! --mark 0 -j RETURN
-A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 10
-A PREROUTING -i eth0 -p tcp -m tcp --dport 443 -j MARK --set-mark 10
-A PREROUTING -m mark ! --mark 0 -j CONNMARK --save-mark
-A OUTPUT -m connmark ! --mark 0 -j CONNMARK --restore-mark
COMMIT
EOF

Now any flow received on ports 80 or 443 will mark the incoming packets and their replies. This mark will be used by the routing stack to change the gateway to 192.168.1.10 for incoming and replies (mangle/OUTPUT triggers a reroute check, see 2nd link below).

It appears there's no need to use src_valid_mark in this case, but just set it or set rp_filter=2 if it doesn't work. This setting won't allow also receiving dnated traffic through 192.168.1.1.


Some links:

  • To Linux and beyond ! Netfilter Connmark
  • Packet flow in Netfilter and General Networking

Solution 2:

Judging from comments the most immediate and important omission is turned off ip forwarding. Just:

echo 1 > /proc/sys/net/ipv4/ip_forward

and check if now the DNATted packets arrive to IP6.

The second issue is asymmetric routing. The DNATted packets come to IP6 through 192.168.1.10 (IP5) where they are modified (destination address is altered). The return packets will go through default gateway on the LAN (182.168.1.1) and they will not be modified on route to the connection origin. They will likely keep their RFC1918 address or will be SNATted to something different on 192.168.1.1 and will never match any connection on their destination and will likely be dropped.

EDIT:

So, to address FORWARD chain I would rewrite it to the following (much simpler IMHO):

table inet filter {
:
    chain FORWARD {
            type filter hook forward priority 0; policy drop;
            ct state established,related accept
            ct status dnat accept
    }
:
}