Caching DNS returns SERVFAIL for NS record, but dig +trace disagrees?
This question is similar, but doesn't elaborate on the confusing case of a why a NS
record cannot be obtained.
One of our caching DNS environments (RHEL 5.8, BIND 9.3.6-20.P1.el5_8.4) has ceased to return any useful data at all for a zone. Usually this sort of problem ends up being a stale NS
or glue record, but in this particular case I can't seem to even get the cache to report a NS
record for the zone.
-
dig @mycache somedomain NS
returnsSERVFAIL
. There are no nameserver records cached at all. -
dig +trace
shows a healthy delegation path, with the final nameserver returning a response. Manually running thedig
query against the final nameserver returns a validNS
record, the correspondingA
record exists and agrees with the glue, etc.
What gives? Why is there no NS
record for me to obtain from the DNS cache, not even a bad one?
If there's no authoritative answer for a NS
record, then there's nothing to cache other than the failure to determine the authority. This is what has been cached, and a server's in-memory information about lame nameservers cannot be obtained by a DNS client. (or rather, this is as close as you're going to get)
Usually you can identify a problem with stale nameserver records by comparing the NS
record in cache to what you find on the internet, but in this case there is no authoritative NS
record to to cache. Glue records are not authoritative in and of themselves; with no authoritative answer, there is simply no authoritative nameserver.
One of two things is usually happening here:
-
dig +trace
is getting a stale answer for an intermediate nameserver from your local cache, and there really is a problem going on at the moment. I've covered this behavior in another question. - The caching server encountered
NXDOMAIN
orSERVFAIL
when chasing glue records to find an authoritative nameserver, and this event has been cached. Even if the problem has been corrected, or the glue has been pointed somewhere else, the nameserver isn't going to try asking for it again until an internal timer expires. Requesting a cache purge for the zone in question will usually reset it.
The latter case is usually the culprit. If you want to be absolutely sure, it may be possible to dump your nameserver's runtime cache and view the glue in memory. (i.e. BIND's rndc dumpdb
) Be advised that this is a very expensive operation unless you can limit the scope of the dump to a single zone, and generally something to be avoided in high load scenarios.