Linux router with 4 NICs not working

I'm setting up a Linux based router, with 4 NICs, and I can't seem to convince it to work, despite following steps suggested on various web sites.

Each interface is on a separate subnet as follows:

eth0 10.1.0.254 (255.255.255.0)
eth1 10.1.1.254 (255.255.255.0)
eth2 10.1.2.254 (255.255.255.0)
eth3 10.1.3.254 (255.255.255.0)

Every device on each network is configured to use 10.1.x.254 as the gateway on the local network.

I've enabled IP forwarding (and also made it permanent in /etc/sysctl.conf)

$ cat /proc/sys/net/ipv4/ip_forward
1

And the routing table looks correct

$ route
Kernel IP routing table
Destination     Gateway    Genmask         Flags Metric Ref    Use Iface
localnet        *          255.255.255.0   U     0      0        0 eth0
10.1.1.0        *          255.255.255.0   U     0      0        0 eth1
10.1.2.0        *          255.255.255.0   U     0      0        0 eth2
10.1.3.0        *          255.255.255.0   U     0      0        0 eth3

Interfaces

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:8d:xx:xx:xx
          inet addr:10.1.0.254  Bcast:10.1.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:28919 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16132 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:35054376 (35.0 MB)  TX bytes:1424175 (1.4 MB)
          Interrupt:22

eth1      Link encap:Ethernet  HWaddr 00:1b:21:xx:xx:xx
          inet addr:10.1.1.254  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:54 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:480 (480.0 B)  TX bytes:3996 (3.9 KB)

eth2      Link encap:Ethernet  HWaddr 00:1b:21:xx:xx:xx
          inet addr:10.1.2.254  Bcast:10.1.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1024 (1.0 KB)  TX bytes:4122 (4.1 KB)

eth3      Link encap:Ethernet  HWaddr 00:1b:21:xx:xx:xx
          inet addr:10.1.3.254  Bcast:10.1.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6419 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6702 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:701177 (701.1 KB)  TX bytes:612622 (612.6 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1753 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1753 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:194373 (194.3 KB)  TX bytes:194373 (194.3 KB)

A PC connected to the 10.1.1.0/24 network (with an IP address of 10.1.1.1, gateway configured correctly as 10.1.1.254), it can ping the local interface of the Linux router, but none of the other 3 router interfaces on the gateway.

I don't have any firewalls (hardware or software) on this network setup.

Am I missing something fundamental?

Edit 1

From the PC 10.1.1.1, I can now ping all the interfaces on the router except one, that being 10.1.0.254. (I'm not sure what I've done to fix that!)

All the subnets involved are hosted by a number of layer 3 switches as VLANs. The interface on the Linux router that isn't responding, is the only VLAN that has a routing interface on our core switch.

tcpdump on eth1 doesn't show any sign of ICMP echo when I try to reach this interface, despite 10.1.1.1 being configured to use 10.1.1.254 (eth1 on the Linux router).

Could it be routing protocols broadcast from the switch that are causing the PC on 10.1.1.1 to not route via 10.1.1.254?

Edit 2

Whilst working on something else, I've now come back to this (without changing anything on the Linux router) and it has stopped working again.

Edit 3

iptables configuration

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination    

Solution 1:

From what you describe it should work. If you can ping gateway's interface on the same network, but cannot ping any other interface, that's mightily strange. TCP/IP stack should respond correctly to ping to interface A even if received over interface B.

As you are pinging gateway's local address (though on a different network), no packet forwarding is involved. I see three possible reasons:

1) The gateway didn't actually receive the packet (routing/filetering problem on the client).

2) The gateway didn't send the reply to the correct place.

3) The gateway chose not to reply (Some kind of firewall. I would double check that, just to be sure it's not the case).

I would use tcpdump or wireshark to make sure what's going on on the wire. You should see ping requests leaving the client. Then on the gateway's interface you should see them arriving. Then I'd listen on all interfaces to see if any reply is sent anywhere. If you see a ping request coming in, then routing is OK on the client side. If you don't see ping reply going out of any interface, then it's either firewall or some weirdness happening to routing tables on the gateway. Finally if gateway sends the reply over correct interface and it doesn't register on the client, it's all client's fault.

I would try a device on some other network (e.g. 10.1.3.0/24), preferably directly connected to server's NIC to be sure that nothing interferes in communication. It may just be a typo, that's devilishly hard to see if you know what you should see. Configuring another device (or reconfiguring the PC you used for first test) makes it less probable that you make the typo again.

And last question -- did it ever work, or is it a new box set up and inserted into the network?

Edit:

As long as you didn't enable any routing daemon on the Linux box it will ignore router traffic from your switches.

What you observe strongly indicates some outside influence (switches, firewalls, aliens or an intern locked in the NOC ;) ). Try to test the setup with two clients connected directly to the Linux box, like this:

           +-------------+
client1 ---+ ethX   ethY +--- client2
           +-------------+
              Linux box

In this setup test routing on the Linux box between all interfaces. Otherwise you will be fighting on several fronts at the same time and this reduces chances of successful troubleshooting. Once you know that Linux box works (or it isn't) it will be time to hunt for reasons of observed interference.

If you have smart switches connected to all the interfaces of your Linux box you might define an IP address on the port directly connected to the Linux box and try pinging between the box and the router. Me, being a suspicious paranoid, I'd carry my own laptop to the box together with a handful of known good patch cords to see what's going on.

Solution 2:

Probably a dumb question, but... you're saying you don't have any firewall on this network, but are you absolutely sure about this?

Linux has its firewall enabled by default, so even if you enable IP forwarding, it will not let any packet pass if you don't explicitly configure IPTABLES for it (or disable the firewall altogether).

The same is true for your PCs, of course: be they Linux or Windows, they will anyway have a built-in firewall, which will be by default active, unless you explicitly disable it.