How is DNS lookup order determined?

Sadly, the answer here is "it depends". The factors it depends on will vary with the domain and how the owning servers are set up as well as how your own local DNS is set up.

First, for example, regarding the NS records returned: it is perfectly allowed to randomise the order in which those records are returned, so the order may differ each time you request it. On the other hand, that is not done by all DNS implementations, so you might well get a statically ordered list. The point is that you cannot be sure.

Next, some DNS implementations will query each NS in parallel, and use whichever one replies first. Others will hit each, determine the fastest over some number of requests and use that one. Or it could just round-robin.

There are multiple RFCs for DNS, two of the more useful that I have found are:

http://www.faqs.org/rfcs/rfc1912.html

http://www.faqs.org/rfcs/rfc1033.html

I realize this is something of a non-answer, without anything definitive for you to take away, but given the above, the only true way you have to determine the behavior for a given domain is to test.

The most common implementation I have seen at the client-level, such as the ISPs around the world, is as follows:

Someone (like a broadband subscriber) asks the ISP's DNS servers to resolve the A record for foo.example.com.
The ISP checks its own cache, and if that record is cached and still considered "fresh," it's immediately returned via the cache. (This is how all DNS caches work, so that they don't needlessly strain the DNS servers of the site in question.)
If they didn't have that record cached, or if the cache is considered "stale/outdated," the ISP knows that it needs to resolve the latest record again.
Now the ISP needs to know what nameservers to query about the latest record.
The ISP begins by checking its cached list of the authoritative nameservers for the domain (these are the ns1.example.com, ns2.example.com and so on along with their IPs). If those records are still considered fresh, it skips down to step 8.
If the cached nameserver records were considered expired, or if it didn't have any cached records for that domain, the ISP queries the root-nameservers of the TLD (such as the .com registry if it's a .com domain) to get the most up-to-date nameserver name/IP pairs for example.com. (You can try this yourself via "dig @b.gtld-servers.net example.com" to see what the root nameservers for your TLD knows about your domain - if your domain belongs to the usual com/net/etc TLDs. Other TLDs would have to query their respective root servers.)
The root nameservers for the TLD always return the nameservers in the exact order they were specified by you; no randomization goes on. They also return the IPs for each nameserver; this is known as "GLUE" and is what allows the internet to solve the "chicken and egg" problem of how to resolve a nameserver hostname to an IP before knowing anything at all about a domain. Moreover, most of them (like the com/net/etc registries which are the largest ones) use a cache time of 2 days so that they don't get hammered constantly with "what is the list of nameservers for domain X?" requests. This is the source of the common knowledge that you MUST wait 2 days until you can safely say that your new nameservers are known worldwide, after you've edited your nameserver list.
When the ISP knows example.com's name servers and their IPs, such as ns1.example.com, ns2.example.com, ns3.example.com, the ISP now picks a random server from that list and sends off the query. (This is nice of them, they don't needlessly hammer all DNS servers of the site in question, and they assist further with load balancing by not always querying the first listed nameserver.)
If the ISP doesn't get a response from that nameserver within a specified timeout period, it queries another one on the list.
When it has a response, the ISP now stores it in its own local cache. As for how long it will remain cached; each record returned by any DNS server also has a "soft expiry" time (in seconds) associated with it, which is how long the querying client (such as the ISP's DNS server) is allowed to cache that record before it's to be considered "still usable but possibly outdated, a new query should now take place IF POSSIBLE just to be sure it hasn't changed." There's also a "hard expiry" time which is specified in the "SOA" (Start of Authority) record of each individual nameserver (you can see yours via "dig @ns1.example.com example.com -t soa"), which specifies a global "hard limit" for all records returned by that server, after which any cache SHOULD DELETE its cached record EVEN IF the nameservers are down and it's impossible to look up the records again. Usually the soft expiry is anywhere from 30 minutes to 5 hours and the hard expiry is usually between 1-3 weeks.
After that exhaustive job, the ISP finally has the latest DNS record and can return it to the querying broadband subscriber, who is none-the-wiser what a huge job has taken place behind the scenes!

This process is repeated for EVERY record lookup. However, only the first query does the whole job; the nameserver IPs will be cached after that and subsequent queries to the ISP's caching DNS server will quickly be able to jump down to step 8.

Now, as for the randomization of step 8, it works on a record-level. Let's say the broadband subscriber of that ISP asked about the following records:

A foo.example.com
A example.com
A www.example.com
MX example.com (an ISP customer shouldn't be asking for this record, but it's just an example)

Each record will be handled as its own separate "entity," independently cached and looked up. So, let's say the subscriber and ISP have never encountered the domain before and both have completely zero cached records. The lookups might be as follows:

A foo.example.com via ns1.example.com, then stored in ISP cache
A example.com via ns3.example.com, then stored in ISP cache
A www.example.com via ns2.example.com, then stored in ISP cache
MX example.com via ns3.example.com, then stored in ISP cache

Whenever the cached records are soft-expired, the process is repeated, so you don't even know that subsequent requests for that record will use the same server again.

It is therefore your absolute greatest goal to make sure that all of your DNS servers are completely in sync with each other, perfectly mirroring every DNS record across every server. You never know which server a DNS client will be hitting and you cannot rely on any order whatsoever. There is no such thing.

Further, as mentioned by Adam C, the server-level (example.com) DNS servers themselves could return NS records and randomize the order of those. It's very common for regular DNS servers to be randomizing their NS records on the slight off-chance that a poor DNS implementation always chooses the first returned namserver. However, the ROOT TLD nameservers (mentioned earlier) will never randomize the list, and their list is what really matters when it comes to resolving the domain. That's why most implementations pick a random server from nameserver lists, to avoid always hitting the same server and overloading it.

Alright, that's your primer in how DNS works and what you should remember.

In short: Treat all of your DNS servers as if they were just one server, making it your highest goal in life to make sure that they are all equally capable of answering any query that might be thrown at them.

Disclaimer: Higher goals in life than managing DNS may be available but are sold separately, use your imagination. ;-)

How is DNS lookup order determined?

Related

Recent Posts