How are DNS timeouts supposed to work?

I recently had a problem where a remote service requesting the IP address for my server (with a hosted DNS provider) was responding with:

DNS problem: SERVFAIL looking up A for mysql.xavamedia.nl

(Update: the remote service mentioned here is Let's Encrypt; I filed a bug against their issue tracker, which led me on this path.)

In testing on my local network, I was able to see that I sometimes get an empty DNS response from the hosted DNS server. Apparently this is intermittent because it happens only when the DNS records are not in the cache, and it's only a problem when the DNS server is really busy.

Here's a Wireshark description of an empty response message:

Wireshark screenshot of empty response

Of course, since most DNS queries and responses are sent over UDP, a local resolver will just wait a while for the response, and then give up. What I am now left wondering is, are there guidelines for DNS response times? My DNS hoster sort of shrugged and said that my local resolver sent the empty response too soon. I've never had this problem before, but I'm surprised at the failure mode -- an empty DNS response without an error code.

Does someone know of some guidelines on how this is supposed to work, and when/how I can prove my DNS hosting is doing something wrong?


Solution 1:

The empty response that you're looking at is a synthetic state known as NODATA. NODATA and NXDOMAIN both indicate that the name does not exist, but NXDOMAIN applies to all names beneath the indicated record as well. NODATA is advising that either that name is associated with records of an unrequested type, or that there are other records that are beneath what you're requesting. (i.e. example.test.xavamedia.nl.)

Your takeaway from NODATA and NXDOMAIN is effectively the same in this context: the record of the requested name and type did not exist. An authoritative nameserver was reached for the requested domain, and it replied back stating that a record of that name and type did not exist. This is not a communication error. The authoritative server said that it didn't have the data. More than likely the server you were talking to had already processed this request and negative cached the absence of that record within the last four hours. (14400 seconds is the negative cache interval defined by the SOA record for xavamedia.nl.)

Neither NXDOMAIN or NODATA by themselves will result in a timeout when encountered in this instance, but your resolver library will probably move on from here to appending the DNS search suffix, which may in turn trigger a timeout for the authoritative DNS servers of the search domain.

It should be noted that none of this explains why you encountered a SERVFAIL response when looking up mysql.xavamedia.nl.. That points at a problem with the recursive server getting the answer from the authoritative servers. Either the authoritative server replied with SERVFAIL, the recursive server could not reach any of the authoritative servers, or the recursive server determined that the data returned was invalid. None of this can be proven with the information that you've provided.

Solution 2:

I don't know of any specific guidelines except those defined in section "6.1.3.3 Efficient Resource Usage" of RFC 1123 http://tools.ietf.org/rfcmarkup?rfc=1123#page-77

There a timeout value of "no less than 5 seconds" is specified. The RFC also states that temporary failures should be cached. This is to prevent excessive amount of DNS requests if clients violate section 2.2 of the RFC. That section states that clients should wait a "reasonable" amount of time between retries in case of soft failures.

There is also a Stackoverflow thread about this topic, but it doesn't contain much more information except for some real-world observations. https://stackoverflow.com/questions/3036054/ideal-timeout-period-for-dns-lookup

That's all I can say about this topic. If someone else has more to add, I'd be interested as well.