Cluster failover and strange gratuitous arp behavior
Solution 1:
I've started to see machines getting incorrect ARP table entries for several SQL Server instances in a failover cluster.
Client servers are alternatively populating their ARP tables with MAC addresses from the correct NIC team and the MAC address from one of the physical NICs (not the necessarily the corresponding NIC team MAC on that server) on a different cluster node.
This is causing intermittent connection failures for clients on the same LAN as the SQL Cluster.
This behavior has been noted by both VM clients as well as physical boxes.
This occurs after a failover and lasts for days.
In order to mitigate this, I've had to set static arp entries on the more troublesome clients.
ENVIRONMENT:
- Windows 2008 R2 SP1 Servers in a failover cluster
- SQL Server 2008 R2 Instances
- Teamed Intel Gigabit NICS
- HP 28XX switches
- Virtual Machines hosted on Windows Server 2008 R2 SP1 Hyper-V
The Intel NIC team creates a virtual adapter with the MAC address of one of the physical NICs.
I have a suspicion that the Intel NIC teaming software is the culprit, but any other troubleshooting thoughts or solutions would be appreciated.
I'm likely going to rebuild the cluster hosts with Server 2012 and use the in-box NIC teaming there (as I have not seen that issue with my testing with that platform).
Solution 2:
Do you have the latest cluster hotfixes applied? There are some fairly serious known defects.
A transient communication failure causes a Windows Server 2008 R2 failover cluster to stop working
https://support.microsoft.com/kb/2550886
Slow failover operation if no router exists between the cluster and an application server
https://support.microsoft.com/kb/2582281
"This issue occurs because the TCP/IP stack of the application server incorrectly ignores gratuitous Address Resolution Protocol (ARP) requests."
Solution 3:
This is purely speculative, but my guess is that there may be some bad interaction with RLB being enabled (Which gets turned on by default, and with Lazerpld, Steven, and Stack Exchange have all hit whatever this bug is now). From the Intel teaming whitepaper:
Receive load balancing (RLB) is a subset of ALB. It allows traffic to flow in both Tx and Rx on all adapters in the team. When creating an RLB team in Windows, this feature is turned on by default. It can be disabled via the Intel® PROSet GUI using the team’s Advanced Settings.
In RLB mode, when a client is trying to connect to a team by sending an ARP request message, Intel ANS takes control of the server ARP reply message coming from the TCP stack in response. Intel ANS then copies into the ARP reply the MAC address of one of the ports in the team chosen to service the particular end client, according to the RLB algorithm. When the client gets this reply message, it includes this match between the team IP and given MAC address in its local ARP table. Subsequently, all packets from this end client will be received by the chosen port. In this mode, Intel ANS allocates team members to service end-client connections in a round-robin fashion, as the clients request connections to the server. In order to achieve a fair distribution of end clients among all enabled members in the team, the RLB client table is refreshed at even intervals (default is five minutes). This is the Receive Balancing Interval, which is a preconfigured setting in the registry. The refresh involves selecting new team members for each client as required. Intel ANS initiates ARP Replies to the affected clients with the new MAC address to connect to, and redistribution of receive traffic is complete when all clients have had their ARP tables updated by Intel ANS.
The OS can send out ARP requests at any time, and these are not under the control of the Intel ANS driver. These are broadcast packets sent out through the primary port. Since the request packet is transmitted with the team’s MAC address (the MAC address of the primary port in the team), all end clients that are connected to the team will update their ARP tables by associating the team’s IP address with the MAC address of the primary port. When this happens, the receive load of those clients collapses to the primary port.
To restart Rx load balancing, Intel ANS sends a gratuitous ARP to all clients in the receive hash table that were transmitting to non-primary ports, with the MAC address of the respective team members. In addition, the ARP request sent by the OS is saved in the RLB hash table, and when the ARP reply is received from the end client, the client’s MAC address is updated in the hash table. This is the same mechanism used to enable RLB when the server initiates the connection.
So my theory is that perhaps when windows clustering releases the virtual IP, than the Intel driver doesn't see that the IP has been released, and continues to announce it. That being said, right now this is just a theory.