ARP reply vanishes from br0 to tap0 using OpenVPN in bridging mode
I have setup a linux box (on an esxi5) which acts as an OpenVPN server. the server is configured to use bridging for the clients, which essentially works, with one exception.
If the client pings some machine on the network which is not the server itself it does not work. I ruled out everything I know of (iptables, etc) and running tcpdump boiled it down to the following things:
- I see ARP requests on tap0 and br0
- I see the ARP replies on br0
- I do NOT see the ARP replies on tap0
Question: why does the br0 device not forward ARP replies to the tap0 device?
Without more info, we are guessing, but lets try:
First make sure that both eth0 and tap0 are in promiscuous mode. br0 should not be in promiscuous mode.
Next check it you have arptables and any iptables rules that might be interfering.
As you already get arp replies, your probably don't have this, but check it anyway.
finally check the rp_filter settings, but also check any extra sysctl parameters you may have set.
If your ESXi host has redundant connections to the network, there are a variety of ARP issues that can appear due to the default setting of Net.ReversePathFwdCheckPromisc. pfSense users using CARP were among the earliest to debug this, described over at https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting
In a similar environment, we have OpenVPN bridging set up on FreeBSD, but also the additional complication of vlans. On a host where Net.ReversePathFwdCheckPromisc has not been set to 1, and where multiple uplinks to the network exist, we see massive packet loss (95%+) on inbound traffic to the tap device. It works just fine when set to 1.