How long does negative DNS caching typically last?
If a DNS server looks up a record and it's missing, it will often "negatively cache" the fact that this record is missing, and not try to look it up again for a while. I don't see anything in the RFC about the TTL on negative caching should be, so I'm guessing it's somewhat arbitrary. In the real world, how long do these negative records stick around for?
The TTL for negative caching is not arbitrary. It is taken from the SOA record at the top of the zone to which the requested record would have belonged, had it existed. For example:
example.org. IN SOA master-ns1.example.org. Hostmaster.example.org. (
2012091201 43200 1800 1209600 86400 )
The last value in the SOA record ("86400") is the amount of time clients are asked to cache negative results under example.org.
.
If a client requests doesnotexist.example.org.
, it will cache the result for 86400 seconds.
This depends on your exact definition of a "negative query", but in either case, this is documented in rfc2308 «Negative Caching of DNS Queries (DNS NCACHE)»:
NXDOMAIN
- If the resolution is successful, and results in
NXDOMAIN
, the response will come with aSOA
record, which would contain theNXDOMAIN
TTL (traditionally known as theMINIMUM
field).rfc2308#section-4
SERVFAIL
-
If the resolution is not successful, and results in a timeout (
SERVFAIL
), then it may as well not be cached at all, and in all circumstances MUST NOT be cached for longer than 5 minutes.rfc2308#section-7.1
Note that in practice, caching such results for the full allowable 5 minutes is a great way to diminish the experience of a client should their cache server occasionally suffer brief connectivity issues (and effectively make it easily vulnerable to a Denial-of-Service amplification, where a few seconds of downtime would result in the certain parts of the DNS being down for the five full minutes).
Prior to BIND 9.9.6-S1 (released in 2014), apparently,
SERVFAIL
was not cached at all.a878301
(2014-09-04)E.g., at the time of your question and in all versions of BIND released prior to 2014, the BIND recursive resolver DID NOT cache
SERVFAIL
at all, if the above commit and the documentation about the first introduction in 9.9.6-S1 is to be believed.In the latest BIND, the default
servfail-ttl
is1s
, and the setting is hardcoded to a ceiling of30s
(in place of the RFC-mandated ceiling of300s
).90174e6
(2015-10-17)Furthermore, the following are some noteworthy quotes on the matter:
-
https://kb.isc.org/article/AA-01178/ (2014/2016-01-07)
The outcome of caching SERVFAIL responses has included some situations where it was seen to be detrimental to the client experience, particularly when the causes of the SERVFAIL being presented to the client were transient and from a scenario where an immediate retry of the query would be a more appropriate action.
-
http://cr.yp.to/djbdns/third-party.html (2003-01-11)
The second tactic is to claim that widespread DNS clients will do something Particularly Evil when they are unable to reach all DNS servers. The problem with this argument is that the claim is false. Any such client is clearly buggy, and will be unable to survive in the marketplace: consider what happens if the client's routers briefly go down, or if the client's network is temporarily flooded.
-
In summary, an NXDOMAIN
response would be cached as specified in the SOA
of the applicable zone, whereas SERVFAIL
is unlikely to be cached, or, if cached, it'll be at most a double-digit number of seconds.
There is an RFC dedicated to this topic: RFC 2308 - Negative Caching of DNS Queries (DNS NCACHE).
The relevant section to read is 5 - Caching Negative Answers which states:
Like normal answers negative answers have a time to live (TTL). As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. This TTL decrements in a similar manner to a normal cached answer and upon reaching zero (0) indicates the cached negative answer MUST NOT be used again.
Firstly lets identify the SOA.MINIMUM
and SOA TTL described in the RFC. The TTL is the number before the the record type IN
(900
seconds in the example below). While the minimum is last field in the record (86400
seconds in the example below).
$ dig serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline
; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline
;; global options: +cmd
serverfault.com. 900 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. (
1 ; serial
7200 ; refresh (2 hours)
900 ; retry (15 minutes)
1209600 ; expire (2 weeks)
86400 ; minimum (1 day)
)
Now lets look at some examples, the serverfault.com
zone is illustrative as it has authoritative servers from two different providers that are configured differently.
Lets find the authoritative nameservers for the serverfault.com
zone:
$ host -t ns serverfault.com
serverfault.com name server ns-860.awsdns-43.net.
serverfault.com name server ns-1135.awsdns-13.org.
serverfault.com name server ns-cloud-c1.googledomains.com.
serverfault.com name server ns-cloud-c2.googledomains.com.
Then check the SOA record using an aws nameserver:
$ dig serverfault.com soa @ns-1135.awsdns-13.org | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com. 900 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
From this we can see that the TTL of the SOA record is 900
seconds while the negative TTL value is 86400
seconds. The SOA TTL value of 900
is lower so we expect this value to be used.
Now if we query an authoritative server for a non existent domain we should get a response without an answer and with a SOA record in the authority section:
$ dig nxdomain.serverfault.com @ns-1135.awsdns-13.org
; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-1135.awsdns-13.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51948
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;nxdomain.serverfault.com. IN A
;; AUTHORITY SECTION:
serverfault.com. 900 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
;; Query time: 125 msec
;; SERVER: 205.251.196.111#53(205.251.196.111)
;; WHEN: Tue Aug 20 15:49:47 NZST 2019
;; MSG SIZE rcvd: 135
When a recursive (caching) resolver receives this answer it will parse the SOA record in the AUTHORITY SECTION
and use the TTL of this record to determine how long it should cache the negative result (in this case 900
seconds).
Now lets follow the same procedure with a google nameserver:
$ dig serverfault.com soa @ns-cloud-c2.googledomains.com | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com. 21600 IN SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
You can see that the google nameservers have different values for both the SOA TTL and the Negative TTL values. In this case the negative TTL of 300
is lower than the SOA TTL of 21600
. Therefore the google server should use the lower value in the AUTHORITY SECTION
SOA record when returning an NXDOMAIN
response:
$ dig nxdomain.serverfault.com @ns-cloud-c2.googledomains.com
; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-cloud-c2.googledomains.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25920
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;nxdomain.serverfault.com. IN A
;; AUTHORITY SECTION:
serverfault.com. 300 IN SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
;; Query time: 130 msec
;; SERVER: 216.239.34.108#53(216.239.34.108)
;; WHEN: Tue Aug 20 16:05:24 NZST 2019
;; MSG SIZE rcvd: 143
As expected the TTL of the SOA record in the NXDOMAIN
response is 300
seconds.
The example above also demonstrates how easy it is to get different answers to the same query. The answer that an individual caching resolver ends up using is down to which authoritative namserver was queried.
In my testing I have also observed that some recursive (caching) resolvers do not return an AUTHORITY SECTION
with a SOA record with a decrementing TTL for subsequent requests whereas others do.
For example the cloudflare resolver does (note the decrementing TTL value):
$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com. 674 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com. 668 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
While the default resolver in an AWS VPC will respond with an authority section only on the first request:
$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com. 300 IN SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1 | wc -l
0
Note: This answer addresses the behavior of NXDOMAIN
answers.
Glossary:
- Zone
- SOA
- TTL
- Recursive NameServer
- Authoritative Nameserver