Linux policy routing - packets not coming back [closed]

i am trying to set up policy routing on my home server. My network looks like this:

Host routed          VPN gateway          Internet link
through VPN

192.168.0.35/24 ---> 192.168.0.5/24   ---> 192.168.0.1 DSL router
                     10.200.2.235/22  ....             .... 10.200.0.1  VPN server

The traffic from 192.168.0.32/27 should be and is routed through VPN. I wanted to define some routing policies to route some traffic from 192.168.0.5 through VPN as well - for start - from user with uid 2000. Policy routing is done using iptables mark target and ip rule fwmark.

The problem:

When connecting using user 2000 from 192.168.0.5 tcpdump shows outgoing packets, but nothing comes back. Traffic from 192.168.0.35 works fine (here I am not using fwmark but src policy).

Here is my VPN gateway setup:

# uname -a
Linux placebo 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:49:02 UTC 2012 i686 i686 i386 GNU/Linux
# iptables -V
iptables v1.4.12
# ip -V
ip utility, iproute2-ss111117

IPtables rules (all policies in table filter are ACCEPT)

# iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 770K packets, 314M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 767K packets, 312M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 5520 packets, 1920K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 782K packets, 901M bytes)
 pkts bytes target     prot opt in     out     source               destination         
   74  4707 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            owner UID match 2000 MARK set 0x3

Chain POSTROUTING (policy ACCEPT 788K packets, 903M bytes)
 pkts bytes target     prot opt in     out     source               destination         


# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 996 packets, 51172 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 7 packets, 432 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 1364 packets, 112K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 2302 packets, 160K bytes)
 pkts bytes target     prot opt in     out     source               destination         
  119  7588 MASQUERADE  all  --  *      vpn  0.0.0.0/0            0.0.0.0/0           

Routing:

# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master lan state UNKNOWN qlen 1000
    link/ether 00:40:63:f9:c3:8f brd ff:ff:ff:ff:ff:ff
       valid_lft forever preferred_lft forever
3: lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 00:40:63:f9:c3:8f brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.5/24 brd 192.168.0.255 scope global lan
    inet6 fe80::240:63ff:fef9:c38f/64 scope link 
       valid_lft forever preferred_lft forever
4: vpn: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
    link/none 
    inet 10.200.2.235/22 brd 10.200.3.255 scope global vpn

# ip rule show
0:  from all lookup local 
32764:  from all fwmark 0x3 lookup VPN 
32765:  from 192.168.0.32/27 lookup VPN 
32766:  from all lookup main 
32767:  from all lookup default 

# ip route show table VPN
default via 10.200.0.1 dev vpn 
10.200.0.0/22 dev vpn  proto kernel  scope link  src 10.200.2.235 
192.168.0.0/24 dev lan  proto kernel  scope link  src 192.168.0.5

# ip route show
default via 192.168.0.1 dev lan  metric 100 
10.200.0.0/22 dev vpn  proto kernel  scope link  src 10.200.2.235 
192.168.0.0/24 dev lan  proto kernel  scope link  src 192.168.0.5 

TCP dump showing no traffic coming back when connection is made from 192.168.0.5 user 2000

# tcpdump -i vpn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vpn, link-type RAW (Raw IP), capture size 65535 bytes
### Traffic from user 2000 on 192.168.0.5 ###
10:19:05.629985 IP 10.200.2.235.37291 > 10.100-78-194.akamai.com.http: Flags [S], seq 2868799562, win 14600, options [mss 1460,sackOK,TS val 6887764 ecr 0,nop,wscale 4], length 0
10:19:21.678001 IP 10.200.2.235.37291 > 10.100-78-194.akamai.com.http: Flags [S], seq 2868799562, win 14600, options [mss 1460,sackOK,TS val 6891776 ecr 0,nop,wscale 4], length 0
### Traffic from 192.168.0.35 ###
10:23:12.066174 IP 10.200.2.235.49247 > 10.100-78-194.akamai.com.http: Flags [S], seq 2294159276, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 557451322 ecr 0,sackOK,eol], length 0
10:23:12.265640 IP 10.100-78-194.akamai.com.http > 10.200.2.235.49247: Flags [S.], seq 2521908813, ack 2294159277, win 14480, options [mss 1367,sackOK,TS val 388565772 ecr 557451322,nop,wscale 1], length 0
10:23:12.276573 IP 10.200.2.235.49247 > 10.100-78-194.akamai.com.http: Flags [.], ack 1, win 8214, options [nop,nop,TS val 557451534 ecr 388565772], length 0
10:23:12.293030 IP 10.200.2.235.49247 > 10.100-78-194.akamai.com.http: Flags [P.], seq 1:480, ack 1, win 8214, options [nop,nop,TS val 557451552 ecr 388565772], length 479
10:23:12.574773 IP 10.100-78-194.akamai.com.http > 10.200.2.235.49247: Flags [.], ack 480, win 7776, options [nop,nop,TS val 388566081 ecr 557451552], length 0

UPDATE:

I have done what @BatchyX suggested:

# iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 3 packets, 179 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  173 15993 CONNMARK   all  --  *      *       0.0.0.0/0            0.0.0.0/0            CONNMARK restore

Chain INPUT (policy ACCEPT 3 packets, 179 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 1 packets, 67 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 1 packets, 60 bytes)
 pkts bytes target     prot opt in     out     source               destination         
   83  5247 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            owner UID match 2000 MARK set 0x3
  166 16053 CONNMARK   all  --  *      *       0.0.0.0/0            0.0.0.0/0            CONNMARK save

Chain POSTROUTING (policy ACCEPT 2 packets, 127 bytes)
 pkts bytes target     prot opt in     out     source               destination        

Also, I have disabled rp_filter for vpn

# echo 0 > /proc/sys/net/ipv4/conf/vpn/rp_filter

It is better now - I am receiving SYN,ACK packets, but the handshake does not seem to get completed. Also the checksum's of outgoing packets seem to be wrong...

Just as a clue - it is a double NAT scenario - I am NATing packets entering VPN and my VPN provider NATs them before forwarding them to the world.

# tcpdump -vvi vpn
tcpdump: listening on vpn, link-type RAW (Raw IP), capture size 65535 bytes
16:27:56.308479 IP (tos 0x10, ttl 64, id 49013, offset 0, flags [DF], proto TCP (6), length 60)
    10.200.2.235.58020 > wi-in-f104.1e100.net.http: Flags [S], cksum 0xff0b (incorrect -> 0x9790), seq 3580181028, win 14600, options [mss 1460,sackOK,TS val 12420433 ecr 0,nop,wscale 4], length 0
16:27:56.488691 IP (tos 0x0, ttl 46, id 44196, offset 0, flags [none], proto TCP (6), length 60)
    wi-in-f104.1e100.net.http > 10.200.2.235.58020: Flags [S.], cksum 0x12a2 (correct), seq 3226424033, ack 3580181029, win 14180, options [mss 1367,sackOK,TS val 1968045661 ecr 12420433,nop,wscale 6], length 0
16:27:56.799066 IP (tos 0x0, ttl 46, id 44197, offset 0, flags [none], proto TCP (6), length 60)
    wi-in-f104.1e100.net.http > 10.200.2.235.58020: Flags [S.], cksum 0x116c (correct), seq 3226424033, ack 3580181029, win 14180, options [mss 1367,sackOK,TS val 1968045971 ecr 12420433,nop,wscale 6], length 0

Update 2:

As stated before, I am getting SYN,ACK now, but I cannot complete the handshake with ACK packet. So if I telnet from routed user's account I get:

routed@placebo ~ # telnet 85.214.204.92 80
Trying 85.214.204.92...
telnet: Unable to connect to remote host: Connection timed out

And the corresponding tcpdump:

# tcpdump -vvi vpn
tcpdump: listening on vpn, link-type RAW (Raw IP), capture size 65535 bytes
20:33:51.940151 IP (tos 0x10, ttl 64, id 65041, offset 0, flags [DF], proto TCP (6), length 60)
    10.200.2.235.60547 > korn.vibfolks.eu.http: Flags [S], cksum 0x3014 (incorrect -> 0xe817), seq 151728396, win 14600, options [mss 1460,sackOK,TS val 16109341 ecr 0,nop,wscale 4], length 0
20:33:52.142823 IP (tos 0x0, ttl 50, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    korn.vibfolks.eu.http > 10.200.2.235.60547: Flags [S.], cksum 0xf897 (correct), seq 986246473, ack 151728397, win 14480, options [mss 1367,sackOK,TS val 62899312 ecr 16109341,nop,wscale 6], length 0
20:33:52.937974 IP (tos 0x10, ttl 64, id 65042, offset 0, flags [DF], proto TCP (6), length 60)
    10.200.2.235.60547 > korn.vibfolks.eu.http: Flags [S], cksum 0x3014 (incorrect -> 0xe71d), seq 151728396, win 14600, options [mss 1460,sackOK,TS val 16109591 ecr 0,nop,wscale 4], length 0
20:33:53.140728 IP (tos 0x0, ttl 50, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    korn.vibfolks.eu.http > 10.200.2.235.60547: Flags [S.], cksum 0xf79e (correct), seq 986246473, ack 151728397, win 14480, options [mss 1367,sackOK,TS val 62899561 ecr 16109341,nop,wscale 6], length 0
20:33:53.341764 IP (tos 0x0, ttl 50, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    korn.vibfolks.eu.http > 10.200.2.235.60547: Flags [S.], cksum 0xf76b (correct), seq 986246473, ack 151728397, win 14480, options [mss 1367,sackOK,TS val 62899612 ecr 16109341,nop,wscale 6], length 0

But non-routed user connects without problems:

nonrouted@placebo ~ $ telnet 85.214.204.92 80
Trying 85.214.204.92...
Connected to 85.214.204.92.
Escape character is '^]'.
^]

telnet> quit
Connection closed.

Update 3

I have added logging rules into mangle and nat tables to find out where packets get lost.

I log in mangle before and after marking (based on uid), in nat postrouting (based on out iface) in mangle prerouting (based on in iface) and in mangle input and forward (based on restored mark)

Dec  9 01:00:55 placebo kernel: [80760.497780] [VPN mangle OUTPUT pre] IN= OUT=lan SRC=192.168.0.5 DST=85.214.204.137 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=30041 DF PROTO=TCP SPT=48700 DPT=80 SEQ=3158481901 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0132EEB40000000001030304) 
Dec  9 01:00:55 placebo kernel: [80760.497819] [VPN mangle OUTPUT post] IN= OUT=lan SRC=192.168.0.5 DST=85.214.204.137 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=30041 DF PROTO=TCP SPT=48700 DPT=80 SEQ=3158481901 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0132EEB40000000001030304) MARK=0x3 
Dec  9 01:00:55 placebo kernel: [80760.497875] [VPN nat POSTROUTING] IN= OUT=vpn SRC=192.168.0.5 DST=85.214.204.137 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=30041 DF PROTO=TCP SPT=48700 DPT=80 SEQ=3158481901 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0132EEB40000000001030304) MARK=0x3 
Dec  9 01:00:55 placebo kernel: [80760.695265] [VPN mangle PREROUTING pre] IN=vpn OUT= MAC= SRC=85.214.204.137 DST=10.200.2.235 LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=0 DF PROTO=TCP SPT=80 DPT=48700 SEQ=3597895441 ACK=3158481902 WINDOW=14480 RES=0x00 ACK SYN URGP=0 OPT (020405570402080A03FCE5720132EEB401030306) 
Dec  9 01:00:55 placebo kernel: [80760.695305] [VPN mangle PREROUTING post] IN=vpn OUT= MAC= SRC=85.214.204.137 DST=10.200.2.235 LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=0 DF PROTO=TCP SPT=80 DPT=48700 SEQ=3597895441 ACK=3158481902 WINDOW=14480 RES=0x00 ACK SYN URGP=0 OPT (020405570402080A03FCE5720132EEB401030306) MARK=0x3

Conntrack shows:

# conntrack -L --output extended | grep 85.214.204.137 | grep tcp
ipv4     2 tcp      6 59 SYN_RECV src=192.168.0.5 dst=85.214.204.137 sport=48724 dport=80 src=85.214.204.137 dst=10.200.2.235 sport=80 dport=48724 mark=3 use=1

Conclusion - packets never make it to INPUT... why? bad routing?


Make sure your routing is symetric, or disable reverse path filtering (only if you know what you are doing, because fixing your routing is always a better choice).

Let's do the test, with traffic coming from 192.168.0.33:

192.168.0.33 -> 192.178.100.10 iif eth0

Reverse path filtering is enabled by default in ubuntu. It invert the source and destination address, and try to select a route as if it has a packet with these source and destination address. If the interface does not match the interface on which the packet was received, the packet is considered to be spoofed.

So the kernel tries to route 192.178.100.10 -> 192.168.0.33 ... it lookups table main ... finds the entry 192.168.0.0/24 via eth0, which is also the interface on which the packet was received, so the packet is not dropped.

so you NAT that and send 10.200.2.235 -> 192.178.100.10 on the VPN interface. the VPN program encapsulate that and sends this as 192.168.0.5 -> remotevpn. Now you receive an answer from your vpn from the same interface. reverse path filtering will obviously pass here. the VPN decapsulate the results (192.178.100.10 -> 10.200.2.235), then NAT will take place, mangling the packet to restore the original destination address. Then you have a reverse path filtering going on the resulting packet:

192.178.100.10 -> 192.168.0.33 iff vpn

Let's try to route 192.168.0.33 to 192.178.100.10 ... lookup table VPN ... default via 10.200.0.1 which is on dev vpn: PASS.

Now you want to do things from your host, as 192.168.0.5 or as 10.200.2.235 with mark 3. You send that to your VPN, which sends that from 192.168.0.5 to the vpn remote. You get an answer the same way, then VPN will decapsulate (192.178.100.10 -> 192.168.0.5 (or 10.200.2.235)), then reverse path filtering will take place.

192.168.0.5 or 10.200.2.235 -> 192.178.100.10 ... does not lookup table VPN (it has no mark and it does not come from 192.168.0.32/27) so it ends up in table main, which tell it to use interface eth0. The reverse path filtering fails, so that packet is dropped as a IP source spoofing attempt. Thus you don't see the results.

As for why tcpdump does not show these packets ... maybe there is a routing problem on the VPN endpoint too.

As for a solution, in your case, i would use conntrack's connection mark, and set the mark of incoming packet to the mark of the conntrack's connection :

# keep that rule
OUTPUT -m owner .... -j MARK 0x3
# add this one after the previous one: it saves the current mark into connmark
OUTPUT -j CONNMARK --save-mark

# and add this one (in mangle), which sets the mark to the connmark
# if conntrack determines that it is from the same connection.
PREROUTING -j CONNMARK --restore-mark

EDIT:

You shouldn't have to disable reverse path filtering to make this iptables solution work. disabling rp_filter needlessly is not a good way to solve problems, it only hides them.

Now for random thoughs:

  • I keep making guesses about which IP address is used by your programs. Find a program that actually prints out which source IP address it is using. That or tell telnet to bind to either 192.168.0.35 or 10.200.2.235. tcpdump will only show the outgoing packet once it is NATed, and will only show the incoming packet before they are unNATed, so it doesn't tell you which one is actually used. As an expert solution, you can also try to put nflog in a chain and examinate with tcpdump what goes into that chain.

  • Don't masquerade everything going out to vpn, only things that don't come from vpn's IP or subnet. Masquerading your own traffic as your own traffic seems pointless. Maybe conntrack is confused by that.


OK got it working... I still do not know what have I been doing wrong before. Anyway, to get it working I have used:

iptables -t mangle -A OUTPUT -m owner --uid-owner 2000 -j MARK --set-mark 3
iptables -t nat -A POSTROUTING -o vpn -j MASQUERADE

ip rule add fwmark 3 lookup VPN
ip route add default via x.x.x.x table VPN

sysctl -w net.ipv4.conf.vpn.rp_filter=2

Hope it helps others as well.