Multiple internet connections, incoming packets on wrong NIC port (inbound routing issue?)

I'm gonna say sorry from the get go because I don't really know what to ask. There may be an answer somewhere, but my googling didn't get me any.

Here's a sketch of my setup (named things easy) so we have a common ground enter image description here

ens1 is set as default GW, the rest are not. Traffic flows out as I expect it.

It's a new cent os 8 vmware esxi VM with a PCIx Intel 4 port nic passed through and a PCI Intel 1 port nic passed through (this is the internal LAN, not pictured, shouldn't really matter it's physically in another switch). Just in case that matters.

Now I started to setup nftables from scratch (just wasn't able to get it done via firewalld with my custom NAT I need, probably because of the issue I just found; getting there)

Here are the rules (pretty much it, I just renamed the IFs)

table ip nat {
    chain PREROUTING {
            type nat hook prerouting priority 0; policy accept;
            iif "ens4" goto PREROUTING_ENS4
    }

    chain PREROUTING_ENS4 {
            tcp dport { http, https } log prefix "ENS4_dnat"
            tcp dport { http, https } dnat to 192.168.1.4
    }
}

table inet filter {
    chain INPUT {
            type filter hook input priority 0; policy drop;
            iif "lo" accept
            ct state established,related accept
            ct status dnat accept
            iifname "ens5" goto INPUT_LOCAL
            ct state invalid log prefix "INPUT_STATE_INVALID_DROP: "
            ct state invalid drop
            log prefix "INPUT_FINAL_REJECT: "
            reject with icmpx type admin-prohibited
    }
}

So what I want/am expecting is that when I do an http request from the internet on IP A.B.F.H, it will go through IP4, ENS4, and will be DNATed to my internal server ip 192.168.1.4 However, what I actually get is

[118553.454575] INPUT_FINAL_REJECT: IN=ens1 OUT= MAC=00:15:17:7f:16:0a:cc:e1:7f:74:28:20:08:00 SRC=W.X.Y.Z DST=A.B.F.H LEN=52 TOS=0x00 PREC=0x00 TTL=115 ID=7927 DF PROTO=TCP SPT=63873 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0

So it's the right IP but it's coming in the wrong NIC port, which is 1 instead of 4.

The dnat log is not being triggered.

So what could cause this and what can I try to fix it?

Eventually, I will have IPs 2 and 3 DNATed the same way to a couple other VMs, and have my LAN go out through IP1.

My previous network was having just 1 external static IP, it was a cent os 6, and I had all the firewalling done manually via iptables. Worked just fine for years.

As I just moved, switched ISPs and had the chance to get these 4 IPs, I figured I'd setup a new gateway, new centos, and new firewall.

Too many new things (including the 4 port nic and the dedicated 5 port switch), so I have no idea what could cause this (Also, I'm not a network wiz, but I do set things up myself with mostly google's help and occasionally this kind of questions :) )

Also, in case it matters, my ISP initially gave me just the 1 IP on ens1, and about 3-4 days later (after I have verified the internet working and such, I got the rest of them)

Thank you in advance.


This problem is in two parts:

  • ARP Flux. It's caused when you have multiple interfaces on the same ethernet LAN (on the same "layer 2 broadcast domain") while using IP addresses also on the same (IP) LAN. Affects incoming traffic.

    Multiple interfaces (or not the expected interface) will send an ARP reply for an IP set on an other interface. In the end, depending on timing, cache eviction etc., often one interface will then be handling all the incoming traffic rather than each interface its own traffic. This can randomly change (cache, timer...) over time and is due to the weak host model chosen in Linux: consider that IPs belong to the host rather than the interface and answer ARP requests seen on any interface, using any interface.

  • multiple routes/interfaces to the same LAN on the main routing table, but only one interface among "equals" (the first displayed with ip route) is used when the source IP has not been set (eg: initial outgoing connection). If this interface is brought down and up, the order of routes could even change, switching this default interface. Affects outgoing traffic.

Now you can combine both problems at the same time. It's even possible depending on traffic that the incoming and outgoing interfaces won't match (and could even be then blocked because of reverse path filtering).

There are probably multiple solutions to address this. But to address both of them at the same time, I believe using policy routing and arp_filter is the best one:

1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP'd IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request.

because it's a clean definition of what will happen. Other solutions exist using arp_announce and arp_ignore but I'm not sure they apply correctly in your case, since you also want to tie the interface and the IP together. I'll leave it up to you to integrate this in your system's configuration.

So you have to tie the interface, the MAC (ARP traffic), and the IP. Almost everything is symmetrical per interface.

To tie the MAC handling to routing (warning: this disrupts the connectivity until routing rules are functional):

sysctl -w net.ipv4.conf.ens1.arp_filter=1
sysctl -w net.ipv4.conf.ens2.arp_filter=1
sysctl -w net.ipv4.conf.ens3.arp_filter=1
sysctl -w net.ipv4.conf.ens4.arp_filter=1

Adding rules (at fixed priorities, even if not needed) to tie IPs to routing tables (which will have interfaces references):

ip rule add pref 10001 from a.b.c.d lookup 1001
ip rule add pref 10002 from a.b.c.e lookup 1002
ip rule add pref 10003 from a.b.f.g lookup 1003
ip rule add pref 10004 from a.b.f.h lookup 1004

Adding routes (which are a copy from the main table, but with only one interface per table, and each its own default route which will use this interface too). a.b.l.0/21 is the LAN address to tie routes to interfaces:

ip route add table 1001 a.b.l.0/21 dev ens1
ip route add table 1002 a.b.l.0/21 dev ens2
ip route add table 1003 a.b.l.0/21 dev ens3
ip route add table 1004 a.b.l.0/21 dev ens4
ip route add table 1001 default via a.b.m.n dev ens1
ip route add table 1002 default via a.b.m.n dev ens2
ip route add table 1003 default via a.b.m.n dev ens3
ip route add table 1004 default via a.b.m.n dev ens4

Flush learnt addresses, possibly garbled, in the ARP tables:

ip neighbour flush dev ens1
ip neighbour flush dev ens2
ip neighbour flush dev ens3
ip neighbour flush dev ens4

Force other devices on the LAN (in case there is more than just the ISP router) to update their own garbled ARP table by running a Duplicate address detection ARP request (asking who has one's own IP and expecting to have no answer to pass) which will act as Gratuitous ARP announcement (that's what are doing pacemaker/corosync clusters nodes when taking over an IP). Requires the arping command from the iputils package. These commands take 5 secs each, so better run them in parallel.

arping -c 5 -D -I ens1 -s a.b.c.d a.b.c.d &
arping -c 5 -D -I ens2 -s a.b.c.e a.b.c.e &
arping -c 5 -D -I ens3 -s a.b.f.g a.b.f.g &
arping -c 5 -D -I ens4 -s a.b.f.h a.b.f.h &
wait

Now everything should behave as intended: when an IP is already set, the system will use the specific routing table. When an IP is not already set (eg initial outgoing TCP connection), the main table will still be used to determine the main IP (I guess ens1's IP) to use. Once set, ARP will conform to the specific routing table. As it's always the expected interface which is used at every step, this is compatible with strict reverse path filtering.

Be sure the main table leaves no randomness on which IP should be used by default: one interface (ens1?) should probably be given a lower metric than the others. Else some route change (eg: setting interface down then up) might reorder routes and change default behaviour. The default route should be on this interface.

Also. When an interface is set down, its custom routing table (as well as the corresponding part of the main routing table as usual) will be flushed. When the interface is set up nothing will add entries in the custom table. The kernel at least adds the LAN route in the main table (unless using noprefixroute), there's no such thing done in custom tables, and anyway the default route must also be added. You must ensure in your configuration that the table's entries are put back when its interface is set back up: configuration should be per interface, not just run only once at boot.