DB auto failover in c# does not work when the principal server physically goes offline

Solution 1:

After working with MS for a week, we've worked out why this happens.

Essentially, the application is not failing over because it needs to be sure that the database has failed over - and the sql connection is timing out before the connection has determined that the db has failed over.

The process to confirm that the database has failed over (with all the default tcp registry settings) is to:

  1. try to communicate with the principal, see that it is not the principal anymore
  2. communicate with the failover to make sure that it has failed over and that it is now the new principal.

When the principal is down, this communication takes about 21 seconds because it will:

  1. try to communicate with the principal, wait 3 seconds, timeout
  2. try to communicate with the principal again, wait 6 seconds, timeout
  3. try to communicate with the principal again, wait 12 seconds, timeout
  4. try to communicate with the failover partner, see that it has failed over, so fail over in the application.

So if your sql connection isn't waiting 21 seconds (probably more in reality) then its going to timeout before it finishes this dance and its not going to fail over at all.

Solution is to set the timeout in your connection string to large value, we use 60 seconds just to be safe.

Cheers