Dead gateway detection on Windows 2008 Server
We have recently implemented HAProxy for stackoverflow.com. We decided on using TProxy to maintain the source address for clients connecting so our logs and other IIS modules which depend on the client IP address would not require modification. So the packets arrive spoofed as if they have come from an external internet IP address, when in reality they came from a local 192.168.x.x HAProxy IP on our local network.
Both of our web servers have two NICs - one routable class B address on the public internet with a static IP, DNS, and default gateway and one private unroutable class C address configured with a default gateway pointed at the private IP for HAProxy. HAProxy has two interfaces - one public and one private and performs the job of routing packets transparently between interfaces and directing traffic to the appropriate web server.
Ethernet adapter Internet: Description . . . . . . . . . . . : network card #1 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 69.59.196.217 (Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.240 Default Gateway . . . . . . . . . : 69.59.196.209 DNS Servers . . . . . . . . . . . : 208.67.222.222 208.67.220.220 NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Private Local: Description . . . . . . . . . . . : network card #2 DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 192.168.0.2 (Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.0.50 NetBIOS over Tcpip. . . . . . . . : Enabled
We have disabled automatic metrics on each of the web servers and assigned the routable public class B a metric of 10 and our private interface a metric of 20.
We have also set both of these registry keys:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DeadGWDetectDefault"=dword:00000000
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"EnableDeadGWDetect"=dword:00000000
About twice per day we see issues where one of the web servers cannot contact DNS or make connections out to any other servers on the public internet.
We suspect dead gateway detection is falsely detecting an outage on the public gateway and is switching all traffic to the private gateway which has no DNS access at this point but have no way of verifying this.
Is there a way to know if dead gateway detection is running or even an option in Windows 2008 server?
If so, is there a way to disable dead gateway detection in Windows 2008 server?
If not could there be other reasons that we lose the ability to resolve DNS or connect out for a short time?
Solution 1:
Those Dead Gateway Detection DWORDs are useless on Windows Server 2008. The only reason they exist is for compatibility reasons. The TCP/IP driver and Windows router components don't look for these values anymore.
I suspect this feature was rolled into Auto-Tuning, which debuted in Windows Vista. Try executing the following in an elevated command prompt (and reboot):
netsh int tcp set global autotuninglevel=disabled
Update (added September 13, 2009 @7:58PM EST)
If that doesn't work, we'll need more diagnostic output. Start a (circular) trace with either the NetConnection or LAN scenarios and let it continue running until the problem occurs.
netsh trace start scenario=NetConnection maxSize=512
(Example: Starts the NetConnection tracing scenario, with a maximum trace log size of 512MB)
You can open the resulting trace in Network Monitor 3.3, just make sure you install the latest parsers.
Solution 2:
We were not able to arrive at a conclusive result as to why we could not control the behavior of Dead Gateway Detection.
Rather than spend a ton of time troubleshooting this issue we opted to make our HAProxy instance route traffic to the gateway outbound and set both web servers default gateway to the IP of haproxy and removed the internal gateway address.
[ soweb1 ] 69.59.196.220, GW=69.59.196.211 [haproxy]
|
+---- [haproxy] 69.59.196.211, GW 69.59.196.209
|
[ gw ] 69.59.196.209
Now there is only one default gateway which eliminates our issue because dead default gateway detection is no longer used.
Solution 3:
I would question why you even need to change the default gateway to be HAproxy at all. Generally you should not change your default gateway at all unless you're pointing it at a highly available N+1 setup where the gateway IP can failover to another router/machine in the event of something bad happening. If something happened to your HAproxy machine and you didn't have any out-of-band access, then the web servers would just drop off the internet.
As I believe the reason you may be doing this is because you are using Tproxy in your setup to make the clients IP address appear in your logs and not the proxy server's IP, could I suggest that you do this instead
- Add "option forwardfor ..." to your HAproxy config
- Install the x-forwarded-for ISAPI filter
- Remove tproxy from your setup
- Change the default gateway back to the same gateway you were using before with direct connection the internet
I don't have a Windows machine to test this on but I believe it should result in the desired effect without the undesired loss of connectivity.