Wireguard not completing handshake
I have two Debian GNU/Linux systems (bullseye/sid), both running wireguard on port 23456, both behind NAT. Both run a kernel version > 5.6 (wireguard mainlined).
System A is the server, and it dynamically updates a dedicated "A record" in the authoritative nameserver for its internet domain, with the correct public IP address its internet facing router A (ZyWALL USG 100 firewall) is assigned with. It does so once every minute, but the public IP address actually changes only on reboot of the router/firewall, which basically never happens.
System B is behind VDSL router B and it acts as wireguard client, pointing to the dynamically updated "A record" and port 33456. Router B is a consumer grade VDSL router and it allows everything in outbound direction, only replies inbound.
Router/firewall A (ZyWALL USG 100) is configured to allow UDP packets on port 23456 through it and forwards them to server A. Here is the relevant configuration screen:
Here is the server A wireguard configuration file (keys in this snippet, despite being valid, aren't the real ones):
[Interface]
Address = 10.31.33.100/24, fc00:31:33::1/64
ListenPort = 23456
PrivateKey = iJE/5Qy4uO55uUQg8nnDKQ/dFT1MEq+tDfFXrGNj3GY=
# PreUp = iptables -t nat -A POSTROUTING -s 10.31.33.0/24 -o enp1s0 -j MASQUERADE; ip6tables -t nat -A POSTROUTING -s fc00:31:33::/64 -o enp1s0 -j MASQUERADE
# PostDown = iptables -t nat -D POSTROUTING -s 10.31.33.0/24 -o enp1s0 -j MASQUERADE; ip6tables -t nat -D POSTROUTING -s fc00:31:33::/64 -o enp1s0 -j MASQUERADE
# Simon
[Peer]
PublicKey = QnkTJ+Qd9G5EybA2lAx2rPNRkxiQl1W6hHeEFWgJ0zc=
AllowedIPs = 10.31.33.211/32, fc00:31:33::3/128
And here is client B wireguard configuration (again, keys and domain aren't the real ones):
[Interface]
PrivateKey = YA9cRlF4DgfUojqz6pK89poB71UFoHPM6pdMQabWf1I=
Address = 10.31.33.211/32
[Peer]
PublicKey = p62kU3HoXLJACI4G+9jg0PyTeKAOFIIcY5eeNy31cVs=
AllowedIPs = 10.31.33.0/24, 172.31.33.0/24
Endpoint = wgsrv.example.com:33456
PersistentKeepalive = 25
Here is a dirty diagram that depicts the situation:
Client B -> LAN B -> VDSL Router B (NAT) -> the internet -> ZyWALL (NAT) -> LAN A -> Server A
Starting wireguard on both systems does not establish the VPN connection. Activating debug messages on the client and adding a LOG rule into iptables, that logs OUTPUT
packets, I get lots of these:
[414414.454367] IN= OUT=wlp4s0 SRC=10.150.44.32 DST=1.2.3.4 LEN=176 TOS=0x08 PREC=0x80 TTL=64 ID=2797 PROTO=UDP SPT=36883 DPT=33456 LEN=156
[414419.821744] wireguard: wg0-simon: Handshake for peer 3 (1.2.3.4:33456) did not complete after 5 seconds, retrying (try 2)
[414419.821786] wireguard: wg0-simon: Sending handshake initiation to peer 3 (1.2.3.4:33456)
I've added a LOG iptables rule to the server, in order to diagnose router configuration problems.
root@wgserver ~ # iptables -t nat -I INPUT 1 -p udp --dport 23456 -j LOG
It logs the wireguard packets received from the client (but I can't tell if they are somehow invalid or incomplete):
[ 1412.380826] IN=enp1s0 OUT= MAC=6c:62:6d:a6:5a:8e:d4:60:e3:e0:23:30:08:00 SRC=37.161.119.20 DST=10.150.44.188 LEN=176 TOS=0x08 PREC=0x00 TTL=48 ID=60479 PROTO=UDP SPT=8567 DPT=23456 LEN=156
[ 1417.509702] IN=enp1s0 OUT= MAC=6c:62:6d:a6:5a:8e:d4:60:e3:e0:23:30:08:00 SRC=37.161.119.20 DST=10.150.44.188 LEN=176 TOS=0x08 PREC=0x00 TTL=48 ID=61002 PROTO=UDP SPT=8567 DPT=23456 LEN=156
so I'm inclined to assume the A router (ZyWALL USG 100) was correctly configured to let the packets come into the server local network. To confirm that assumption, I've even tried replacing the ZyWALL with another consumer grade router and moving the server over a different internet connection, but the problem is still there, so I'm sure the problem is not the firewall, nor its specific internet connection.
Here is the server network configuration, just in case it matters:
auto lo
iface lo inet loopback
auto enp1s0
iface enp1s0 inet static
address 10.150.44.188/24
gateway 10.150.44.1
On top of that, other wireguard VPN tunnels DO work correctly using the same client, same VDSL router (client-side), same internet connection, similar server configuration (obviouisly different keys and domain), similar firewall configuration (server-side, different firewall model).
Solution 1:
It might be stupid, but did you try to create new server keys, client keys, and retry? Wireguard can act exactly like this when the profiles are wrong.
Solution 2:
OK, you mentioned that the client is on VDSL, so I suspect you have an MTU problem.
The normal MTU of a wired (and these days, wireless) network connection is 1500 bytes, but on *DSL the PPPoE layer takes up 8 bytes, making the usable MTU actually 1492. (It's also possible your network connection has been set to an even lower MTU.)
Wireguard's packet overhead is 80 bytes, meaning the tunnel MTU is 1420 by default. Try lowering this by the same 8 bytes, to 1412. (Or lower if you already had a lower MTU than 1492.)
You also need to have the client to tell the server to lower its MTU on tunnelled packets. This can be done with an iptables rule.
On the client side wg0.conf you will need something like:
[Interface]
MTU = 1412
PostUp = iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
PostDown = iptables -D FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
;....the rest