DNS resolution failing over to secondary DNS - why?

We have large number of branch offices connected via VPN, but without any kind of server infrastructure. The client machines in each office get their network configuration from an ASA 5505, which is also used for the VPN connection.

The Windows XP client machines are configured to use one of our corporate DNS servers as the primary, with the DNS server of the ISP as the secondary. The idea is that if the VPN connection fails for any reason, staff in the office will still be able to access the internet, and access our webmail and home access portal. In the majority of cases this works fine.

However, for offices based in South America we are seeing DNS resolution on the client machines regularly being done against the ISP DNS server - this results in our corporate resources being effectively unavailable to staff in the offices.

The client machines are able to ping the corporate DNS server ok. When doing an nslookup of a corporate hostname, I get a reply.

I'm thinking one of the following (or a combination) is happening:

  • our corporate DNS server is not always replying to requests in a timely fashion (although why this would only affect clients in one geographic region I don't know)
  • DNS queries from Latin America are somehow delayed, causing the client to treat it as failed (although we have offices at the end of much slower VSAT connections which do not have this issue)
  • a single failure is resulting in a DNS cache entry in Windows that somehow results in the lookups not happening on subsequent tries

Has anyone else come across this issue? Any ideas for resolutions?


Windows queries DNS in this order:

  • hosts file
  • local DNS cache
  • Preferred DNS servers
  • Other DNS servers

MS also has an article describing how the DNS server list is obtained:

The DNS Client service uses a server search list, ordered by preference. This list includes all preferred and alternate DNS servers configured for each of the active network connections on the system.

The list is arranged based on the following criteria:

  • Preferred DNS servers are given first priority.
  • If no preferred DNS servers are available, then alternate DNS servers are used.
  • Unresponsive servers are removed temporarily from these lists.

Windows has an escalating timeout for DNS requests:

Value      Default value  Attempt
1st limit       1 second  Query the preferred DNS server on a preferred connection.
2nd limit      2 seconds  Query the preferred DNS server on all connections.
3rd limit      2 seconds  Query all DNS servers on all connections (1st attempt).
4th limit      4 seconds  Query all DNS servers on all connections (2nd attempt).
5th limit      8 seconds  Query all DNS servers on all connections (3rd attempt).
6th value                 (Must be 0.)

I could not find a clear answer on this exact point, but it sounds like if it doesn't get a response from your primary DNS in 1 or 2 seconds (1st or 2nd attempt, respectively), then that server will be removed from the DNS server lookup list for 15 minutes, and so it will use the secondary DNS servers. Since those servers have up to an 8 second timeout, they are much more likely to respond. (It's unclear to me if it continues to query the preferred DNS server during the 3rd+ attempt if it's already failed).

I also suspect that you do indeed have a WAN latency issue for this geographical area, as it would explain why the timeouts are working.


One solution is to change the DNS query timeouts, using the DNSQueryTimeouts registry parameter. See also http://drewthaler.blogspot.com/2005/09/changing-dns-query-timeout-in-windows.html


Another solution is to put a local caching DNS server on the network, and have the clients use that. You can use a DNS server that may be built in to a router, or install something like dnsmasq.


According to Technet http://blogs.technet.com/b/stdqry/archive/2011/12/15/dns-clients-and-timeouts-part-2.aspx later queries are done to multiple DNS servers in parallel.

And most people have only one network connection and a fast broadband connection and should normally expect a DNS response within 1s.

So I have set my DNSQueryTimeouts to 1 1 1 10 10 0 so that it gets to issuing parallel queries to all DNS servers as quickly as possible. Then I put my ISP's DNS servers in the list first (since they are physically closest and least likely to be subject to network packet drops, with several public DNS servers behind, and let Windows do its stuff.

And my web browsing has speeded up immensely!!!

Obviously don't give the ISP DNS servers priority in a corporate environment when you want to internally resolve internal hostnames!