Amazon EC2: OpenVPN server won't route bridged packets from client to VPC subnet

I have a bridged OpenVPN setup on a Linux server in an Amazon EC2 VPC. (Spent hours on docs, reading similar problems, here, openVPN forums, no luck yet.)

The bridged interface is up and contains both sub-interfaces:

# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.0e7c15e787b0       no              eth0
                                                        tap0

Routing is obviously OK on the VPN server; I can SSH in, ping around, respond to the VPN request from the client:

# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG        0 0          0 br0
10.0.0.0        0.0.0.0         255.255.0.0     U         0 0          0 br0

I can ping from both Windows & Mac clients to the VPN server's IP but not to any other IP's on the VPC subnet. (Those other IP's are OK; they are pingable from the VPN server.)

When I tcpdump on the bridge interface br0 on the VPN server it sees the "ARP who-has" requests from the Windows client. However they aren't going onto the VPC subnet! tcpdump on the destination IP does not see the ARP arrive. The Windows arp cache remains unfilled. (10.0.0.128 is the Windows client; 10.0.0.58 is the VPN server; 10.0.0.180 is the other IP on the subnet; the output below is from the VPN server.)

root@vpn:# tcpdump -i br0 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
21:00:21.092367 ARP, Request who-has 10.0.0.180 tell 10.0.0.128, length 28

[crickets]

I have disabled the Source/Dest. check within the EC2 console on the VPN server's sole network interface.

I have set up the IP tables as recommended in the bridging HOWTO, and generally followed these instructions exactly.

# iptables -L INPUT -v
Chain INPUT (policy ACCEPT 9 packets, 1008 bytes)
 pkts bytes target     prot opt in     out     source               destination
   38 12464 ACCEPT     all  --  tap0   any     anywhere             anywhere
10447 1297K ACCEPT     all  --  br0    any     anywhere             anywhere

# iptables -L FORWARD -v
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
  918  167K ACCEPT     all  --  br0    any     anywhere             anywhere

https://openvpn.net/index.php/open-source/documentation/miscellaneous/76-ethernet-bridging.html

I don't think I need to dump the full configs because obviously a lot is working: authentication, certs, compression, address pool, connection set-up generally. Does the Amazon VPC simply refuse to forward packets and I should really be on a somewhat-less-virtual cloud to do this?

MORE EXPERIMENTS THE NEXT DAY: The VPC clearly isn't behaving like a true layer 2 subnet. In particular, ARP who-has broadcasts don't actually broadcast! When I ping a non-existent IP (say .5) from .180, .58 doesn't see the request. The VPC is obviously optimizing away ARP broadcasts and sending it only to .5, if a .5 has been configured in the VPC via management console / API. Leaving tcpdump -vv -i eth0 arp on for a while only shows traffic between the host and the gateway, for all hosts.

Further, pinging the broadcast address on the subnet doesn't work at all. This is backed up by the Amazon VPN FAQ.

So the VPC is likely refusing to recognizing the unknown MAC address of .129, since it doesn't exist in its own "virtual ethernet switch". I'll probably shift this as the answer in a week or so. To extend the VPC with your own VPN, it must be via the formal "VPC gateway", which is only designed to work as an extension of a corporate intranet backed by a dedicated hardware router and static IP, not the roaming laptop scenario I'm aiming for.


Solution 1:

Your VPN needs to be routed, not bridged, and the subnet that your VPN clients are on has to be outside the bounds of the VPC supernet.

Then, you add a static route for the VPN client subnet in the VPC routing tables, with the destination of that route specified as the instance ID of your vpn server instance.

The VPC network is a virtual, software-defined network. It's not a pure layer 2 network, but in most ways, it emulates one quite nicely.  Broadcasts, though, aren't one of those ways.

If you notice, there is not a 1:1 correlation of ARP traffic from one instance to another. The response you get to an ARP who-has doesn't come from the instance with the IP assigned. It comes from the network. If the destination instance actually sees the incoming request, it's not actually seeing the one you sent.

VPC was designed this way for some pretty compelling reasons, scalability and security among them.

The fact that IP addresses within the scope of the VPC's supernet are expected to be on instances, only, is a side effect of this. Even if you use the hardware vpn solution available, you still can't have private addresses within the VPC supernet on the other side of that link... so this is not a limitation wedged in to make you pay for something extra... it's just part of the design.

Recommended viewing: VPC/A Day in the Life of a Billion Packets (CPN401)

http://m.youtube.com/watch?v=Zd5hsL-JNY4