Why does DNS work the way it does?

This is a Canonical Question about DNS (Domain Name Service).

If my understanding of the DNS system is correct, the .com registry holds a table that maps domains (www.example.com) to DNS servers.

  1. What is the advantage? Why not map directly to an IP address?

  2. If the only record that needs to change when I am configuring a DNS server to point to a different IP address, is located at the DNS server, why isn't the process instant?

  3. If the only reason for the delay are DNS caches, is it possible to bypass them, so I can see what is happening in real time?


Solution 1:

Actually, it's more complicated than that - rather than one "central registry (that) holds a table that maps domains (www.mysite.com) to DNS servers", there are several layers of hierarchy

There's a central registry (the Root Servers) which contain only a small set of entries: the NS (nameserver) records for all the top-level domains - .com, .net, .org, .uk, .us, .au, and so on.

Those servers just contain NS records for the next level down. To pick one example, the nameservers for the .uk domain just has entries for .co.uk, .ac.uk, and the other second-level zones in use in the UK.

Those servers just contain NS records for the next level down - to continue the example, they tell you where to find the NS records for google.co.uk. It's on those servers that you'll finally find a mapping between a hostname like www.google.co.uk and an IP address.

As an extra wrinkle, each layer will also serve up 'glue' records. Each NS record maps a domain to a hostname - for instance, the NS records for .uk list nsa.nic.uk as one of the servers. To get to the next level, we need to find out the NS records for nic.uk are, and they turn out to include nsa.nic.uk as well. So now we need to know the IP of nsa.nic.uk, but to find that out we need to make a query to nsa.nic.uk, but we can't make that query until we know the IP for nsa.nic.uk...

To resolve this quandary, the servers for .uk add the A record for nsa.nic.uk into the ADDITIONAL SECTION of the response (response below trimmed for brevity):

jamezpolley@li101-70:~$dig nic.uk ns

; <<>> DiG 9.7.0-P1 <<>> nic.uk ns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21768
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 14

;; QUESTION SECTION:
;nic.uk.                IN  NS

;; ANSWER SECTION:
nic.uk.         172800  IN  NS  nsb.nic.uk.
nic.uk.         172800  IN  NS  nsa.nic.uk.

;; ADDITIONAL SECTION:
nsa.nic.uk.     172800  IN  A   156.154.100.3
nsb.nic.uk.     172800  IN  A   156.154.101.3

Without these extra glue records, we'd never be able to find the nameservers for nic.uk. and so we'd never be able to look up any domains hosted there.

To get back to your questions...

a) What is the advantage? Why not map directly to an IP address?

For one thing, it allows edits to each individual zone to be distributed. If you want to update the entry for www.mydomain.co.uk, you just need to edit the information on your mydomain.co.uk's nameserver. There's no need to notify the central .co.uk servers, or the .uk servers, or the root nameservers. If there was only a single central registry that mapped all the levels all the way down the hierarchy that had to be notified about every single change of a DNS entry all the way down the chain, it would be absolutely swamped with traffic.

Before 1982, this was actually how name resolution happened. One central registry was notified about all updates, and they distributed a file called hosts.txt which contained the hostname and IP address of every machine on the internet. A new version of this file was published every few weeks, and every machine on the internet would have to download a new copy. Well before 1982, this was starting to become problematic, and so DNS was invented to provide a more distributed system.

For another thing, this would be a Single Point of Failure - if the single central registry went down, the entire internet would be offline. Having a distributed system means that failures only affect small sections of the internet, not the whole thing.

(To provide extra redundancy, there are actually 13 separate clusters of servers that serve the root zone. Any changes to the top-level domain records have to be pushed to all 13; imagine having to coordinate updating all 13 of them for every single change to any hostname anywhere in the world...)

b) If the only record that needs to change when I am configuring a DNS server to point to a different IP address is located at the DNS server, why isn't the process instant?

Because DNS utilises a lot of caching to both speed things up and decrease the load on the NSes. Without caching, every single time you visited google.co.uk your computer would have to go out to the network to look up the servers for .uk, then .co.uk, then .google.co.uk, then www.google.co.uk. Those answers don't actually change much, so looking them up every time is a waste of time and network traffic. Instead, when the NS returns records to your computer, it will include a TTL value, that tells your computer to cache the results for a number of seconds.

For example, the NS records for .uk have a TTL of 172800 seconds - 2 days. Google are even more conservative - the NS records for google.co.uk have a TTL of 4 days. Services which rely on being able to update quickly can choose a much lower TTL - for instance, telegraph.co.uk has a TTL of just 600 seconds on their NS records.

If you want updates to your zone to be near-instant, you can choose to lower your TTL as far down as you like. The lower your set it, the more traffic your servers will see, as clients refresh their records more often. Every time a client has to contact your servers to do a query, this will cause some lag as it's slower than looking up the answer on their local cache, so you'll also want to consider the tradeoff between fast updates and a fast service.

c) If the only reason for the delay are DNS caches, is it possible to bypass them, so I can see what is happening in real time?

Yes, this is easy if you're testing manually with dig or similar tools - just tell it which server to contact.

Here's an example of a cached response:

jamezpolley@host:~$dig telegraph.co.uk NS

; <<>> DiG 9.7.0-P1 <<>> telegraph.co.uk NS
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36675
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;telegraph.co.uk.       IN  NS

;; ANSWER SECTION:
telegraph.co.uk.    319 IN  NS  ns1-63.akam.net.
telegraph.co.uk.    319 IN  NS  eur3.akam.net.
telegraph.co.uk.    319 IN  NS  use2.akam.net.
telegraph.co.uk.    319 IN  NS  usw2.akam.net.
telegraph.co.uk.    319 IN  NS  use4.akam.net.
telegraph.co.uk.    319 IN  NS  use1.akam.net.
telegraph.co.uk.    319 IN  NS  usc4.akam.net.
telegraph.co.uk.    319 IN  NS  ns1-224.akam.net.

;; Query time: 0 msec
;; SERVER: 97.107.133.4#53(97.107.133.4)
;; WHEN: Thu Feb  2 05:46:02 2012
;; MSG SIZE  rcvd: 198

The flags section here doesn't contain the aa flag, so we can see that this result came from a cache rather than directly from an authoritative source. In fact, we can see that it came from 97.107.133.4, which happens to be one of Linode's local DNS resolvers. The fact that the answer was served out of a cache very close to me means that it took 0msec for me to get an answer; but as we'll see in a moment, the price I pay for that speed is that the answer is almost 5 minutes out of date.

To bypass Linode's resolver and go straight to the source, just pick one of those NSes and tell dig to contact it directly:

jamezpolley@li101-70:~$dig @ns1-224.akam.net telegraph.co.uk NS

; <<>> DiG 9.7.0-P1 <<>> @ns1-224.akam.net telegraph.co.uk NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23013
;; flags: qr aa rd; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;telegraph.co.uk.       IN  NS

;; ANSWER SECTION:
telegraph.co.uk.    600 IN  NS  use2.akam.net.
telegraph.co.uk.    600 IN  NS  eur3.akam.net.
telegraph.co.uk.    600 IN  NS  use1.akam.net.
telegraph.co.uk.    600 IN  NS  ns1-63.akam.net.
telegraph.co.uk.    600 IN  NS  usc4.akam.net.
telegraph.co.uk.    600 IN  NS  ns1-224.akam.net.
telegraph.co.uk.    600 IN  NS  usw2.akam.net.
telegraph.co.uk.    600 IN  NS  use4.akam.net.

;; Query time: 9 msec
;; SERVER: 193.108.91.224#53(193.108.91.224)
;; WHEN: Thu Feb  2 05:48:47 2012
;; MSG SIZE  rcvd: 198

You can see that this time, the results were served directly from the source - note the aa flag, which indicates that the results came from an authoritative source. In my earlier example, the results came from my local cache, so they lack the aa flag. I can see that the authoritative source for this domain sets a TTL of 600 seconds. The results I got earlier from a local cache had a TTL of just 319 seconds, which tells me that they'd been sitting in the cache for (600-319) seconds - almost 5 minutes - before I saw them.

Although the TTL here is only 600 seconds, some ISPs will attempt to reduce their traffic even further by forcing their DNS resolvers to cache the results for longer - in some cases, for 24 hours or more. It's traditional (in a we-don't-know-if-this-is-really-neccessary-but-let's-be-safe kind of way) to assume that any DNS change you make won't be visible everywhere on the internet for 24-48 hours.

Solution 2:

a) The number of IP --> host name mappings in the world is REALLY big. This system distributes the responsibility of hosting all the subdomain and MX records and every other DNS record out to the owner of the domain name. This was pretty much the point of the domain name. .com is held by one registry where as .uk can be held by another. Likewise example.com and otherexample.com can be hosted separately so the resources can be distributed.

b) It's cached, which reduces the number of hits on your DNS host down to a small fraction of what it would be otherwise. By default records live 2 days in the cache before being discarded. This can be changed by altering the TTL (time to live) of the record.

c) You can effectively stop records being cached by setting a really short TTL. This is NOT recommend unless you're using it for dynamic DNS. Caching drops the hits on the DNS server by a lot. To pick a guessed number out of the air we're talking about knocking off 95% of the requests.

Solution 3:

If you're on a *nix system, download a copy of Dan Bernstein's djbdns from http://cr.yp.to/djbdns.html and run his dnstrace program to see how the recursive query system works. Its very informative.

Solution 4:

a) The number of possible domain names is way too large for one server to handle. And there's not just .com; there's .net, .org, .se, .info, and any number of others. Add to that, you can delegate responsibility for a subdomain (which is effectively what com does). DNS being less centralized makes all that easier to manage.

b) Machines along the way from the user to you have DNS caches in order to minimize the number of requests needed. It prevents, for example, spamming the network with requests for the address of "serverfault.com" every time you get a page from SF. Those servers can even cache "domain doesn't exist" results, which is why it can take a while for even a brand new domain to show up.

c) Although you can disable your cache, there are often other DNS servers between your machine and yourdomain.com's DNS server. Your ISP's DNS server, for example, will try to cache as much as it can. The only records that get updated relatively quickly around the net are ones with a short TTL (which basically says "i'm only valid for a few seconds; after that, ask me again for current info"). The reason TTLs are so high, though, is so that the server responsible for the domain can offload some of the work to other servers. If you had all the net contacting your one or two rinky-dink DNS servers for every hit to your web site, they'd be pretty much unusable the second someone saw your site on /., digg, etc.