Why can Avahi `.local` domains not be resolved over a VPN?

I have a homelab setup with a few Raspberries Pi that use Avahi for service announcement - I can ssh to them with ssh pi@<hostname>.local instead of having to remember individual IP addresses. In particular, note that this not only works from my laptop, but also from the Pis themselves - that is, if I am ssh'd to pi1, I can successfully ssh [email protected].

I recently set up a Wireguard VPN running on one of the Pis to allow me to access my homelab even when away from home. When using this VPN on my laptop, I'm able to ssh to a pi with ssh pi@<LAN-IP-address-of-pi>, but not with ssh pi@<hostname>.local.

Why is that? My (admittedly basic) understanding of VPNs is that they operate by making your connection behave "as if" it originated from the VPN server. If that is the case, why would I get different ssh behaviour via a VPN than from the VPN server itself? Or, if that is not the case, how is it incorrect?

My intuition is that DNS resolution does not go via the VPN - when I compare the results of dig pi.local from my VPN'd-laptop and from the VPN server, I get different results (not sharing the whole payload since I don't know enough about networking and security to know what is safe to share and what is not):

  • from my laptop, the ;SERVER line references 8.8.8.8 (which I believe is a Google-owned standard DNS server, which would correctly "not know about" my Pi's IP address on the LAN)
  • from the VPN server, the ;SERVER line references the LAN IP address of my LAN's pi-hole

Interestingly, though, neither response actually contains an ANSWER section or the Pi's actual IP address.


Solution 1:

My intuition is that DNS resolution does not go via the VPN

Whether it does or not, that doesn't affect mDNS (Avahi). The .local names are resolved by a separate mDNS resolver, not using the standard unicast DNS resolver, and the whole point of mDNS is that it works without DNS servers; it works by broadcasting a query packet to the entire local subnet.

(Multicasting, technically, but as it's link-scoped you can pretend it's the same as a local broadcast.)

You shouldn't actually get any "ANSWER" responses out of dig whatever.local, even if Pi-Hole is being referenced. In case you get a response from the Pi-Hole, that's not Avahi/mDNS – that's just regular DNS. (Home routers implement local DNS by gathering hostnames from DHCP lease requests; but they usually do not gather mDNS advertisements, even if you happen to have .local as the DNS domain – which you shouldn't.)

Why is that? My (admittedly basic) understanding of VPNs is that they operate by making your connection behave "as if" it originated from the VPN server. If that is the case, why would I get different ssh behaviour via a VPN than from the VPN server itself? Or, if that is not the case, how is it incorrect?

More generally than that, VPNs operate by connecting you to a network together with the VPN server. You could say they form a virtual LAN across the Internet. They don't necessarily masquerade your connections or make it look like you're connecting "as if" the VPN server – that's done separately if needed. (Masquerading is not a VPN-specific feature, it's the exact same NAT that would be used with physical LANs.)

(That's the real difference between VPNs and proxies, aside from the OSI layer distinction. A proxy server's main purpose is to relay requests so that they come "as if" from the proxy. A VPN's main purpose is to build a virtual network.)

Most VPN types don't directly connect you to the VPN server's existing network, however, but create a new one. You have two separate networks – the VPN-created "LAN" and the physical LAN – with the VPN server acting as a router in-between. Like a physical router would have eth0 eth1 eth2 or LAN/WAN, your VPN server routes between eth0 (the LAN) and wg0 (the WireGuard VPN).

Just as with two physical subnets separated by a router, the two can exchange regular (unicast) packets between devices, but a router will not relay local broadcasts between subnets, nor will it relay link-scope multicasts that mDNS uses. So when your VPN client sends out an mDNS multicast query – it only goes as far as the "local" subnet between VPN client and VPN server. It is never forwarded from the server's wg0 interface out through eth0 or whatever else. To make this work, the VPN server would probably need to run its own avahi-daemon in "relay" or "reflector" mode, where it proxies received mDNS queries to other interfaces and sends replies back.

Additionally, many VPN connections don't actually emulate a "broadcast-capable" interface like Ethernet, making broadcast and multicast packet usage impossible in the first place. WireGuard in particular forms a kind of "point-to-multipoint" network where it's more like a bunch of one-to-one connections under the hood – all of them are hidden behind a single wg0 interface, but internally it uses AllowedIPs to decide which peer to send each packet to, and it makes no exception for broadcasts or multicasts. Roughly the same applies to OpenVPN in its default "tun" mode (or most anything else that uses "tun" interfaces).

But some other VPNs, such as OpenVPN in "tap" mode, as well as software like Tinc or ZeroTier, do emulate an Ethernet-like network with layer-2 addressing, allowing them to handle multicasts and broadcasts in a more conventional way. In fact, they mostly just emulate regular Ethernet, allowing the VPN server to bridge the VPN and LAN interfaces instead of routing between them. If you set up a "bridge" VPN, then connected VPN clients will in fact become part of the same LAN subnet as local devices and will automatically be able to see each other's mDNS broadcasts.

(The downside is, also, that they will see each other's mDNS broadcasts – and ARP broadcasts, and Dropbox broadcasts, and UPnP/SSDP broadcasts, and NetBIOS broadcasts, and WS-Discovery broadcasts, and a whole lot of background noise.)