why would adding a crossover private nic cause IP address resource fail?
- 2 node Win 2008 R2 quorum cluster
- Configured 192.168.0.0/24 "public network"
- Configured cluster MSDTC
- Installed SQL 2008 R2 cluster instance
- Manually failover groups - OK
- Reboot server w/o failing over groups - failover OK
- Pull public network cable from one node - failover OK
- Added crossover cable 10.2.0.0/24 "private network"
- Verified ping on private network
- Verified file share browse on private network to C$ admin shares
- Pull public network cable from one node - MSDTC IP address resource fails on original host rather than failing over
- Manually moved MSDTC group to other node - everything onlines normally
- Reseat public network cable on node 1 & verified everything online on node 2
- Pull public network cable from node 2 - MSDTC and SQL IP address resources fail on original host rather than failing over
- Reseat public network cable on node 2 and manually fail all resources back and forth - OK
- Reboot server w/o failing over groups - failover OK
- Removed crossover cable private network and disabled private network NIC's (only single public network remains)
- Pull public network cable from one node - everything including MSDTC fails over normally
Several people have told me in the aftermath not to do a cable pull test - anyone have any documentation on why and potential impact? Just saying MVP says so won't fly with manager w/o documentation.
More importantly, why would adding a crossover cable have this kind of impact?
The private interconnect between two cluster nodes serves a dual purpose: to verify that the nodes can see each other, and to verify that each node has a working private connection.
MS Cluster Service requires both of these checks to pass, or it will proclaim the node unreliable, and refuse to fail over, reasoning (quite correctly) that not changing anything in an unknown situation is better than failing over in an unknown situation - possibly to a failed node.
If the private NIC fails, that node automatically assumes itself unsuitable for failover, and effectively exits the cluster.
If both ends of the private connection are plugged into a switch, these tests become indepedent of each other, and each node can safely fail its private connection, without the other node failing as well.
Moral of the story: crossover cables are bad, and best left to amateurs.
If it's a crossover cable directly between your cluster nodes, of course the IP address resource will fail - pulling the connection from one node will also cause the other node to see the NIC go to an 'unplugged' status - therefore how can it bring the IP resource up on the other node?