What happens when your TTL gets screwed up in your DNS record?

What happens when someone gets access to your DNS control and sets a TTL of 100 years on your domain, while pointing it's IP to some obscure website?

(and you discover it too late of course)


Ryan has provided an excellent answer to one interpretation of your question. Given our target audience however, and the situation of the people most likely to stumble upon the question, I'm going to answer a different one.

What does a company do when a bad TTL makes it out into the wild?

You have a few options here. First and foremost though, you need to identify the problem vector and eliminate it. Trying to contain the damage is pointless when you have no control over the problem repeating itself.

  1. Wait. If it's not a crucial record, you can probably wait it out. As Ryan has covered, the "maximum damage" is not 68 years, but in practice most likely to be 7 days. This is the most common default for the maximum life of a positive cache entry (BIND, JunOS, etc.). Even in cases where this is not accurate, one would hope the server is receiving routine security updates that force a process restart. Speaking as the operator of several large clusters I do not find it likely that a MSO would set this to a larger value on purpose: it only serves to generate more external inquiries (which we hate). You may have to move on to the next steps for companies using less popular software, or operators who hate themselves.
  2. Annoy DNS cache operators. If you need to get record cleared from cache ASAP, your only real choice is to start reaching out to the largest providers of recursive DNS you can think of and work your way down. Some of these companies are likely to ignore you: either they think your company is too small for their customers to care about, or they institute cache purging policies of their own to minimize the number of support calls they have to deal with. In the latter case, they will probably shrug and let the problem take care of itself at the scheduled time. Your company did create this problem for itself, after all.
  3. Get ISP customers to annoy their ISP for you. If it's been a few days and a large ISP is ignoring the cached record, try to get one of their customers to complain and generate a ticket internal to that company. This is harder for them to ignore, but it will not win you any favors with their ops team as from their perspective you did this to yourself. If this is a repeat occurrence, they will probably start canceling these tickets just to spite you.
  4. Advise your partners to bypass the DNS record. If it's a mission critical DNS record consumed by your partners and none of the above options are acceptable (i.e. you are bleeding revenue by the minute), your company has no choice but to work with its partners to bypass the problem. If they do not control their local cache, this is usually this is accomplished by inserting entries into the hosts table of the effected systems as it avoids the need to modify the programs that are using the DNS record. This is only viable if the revenue loss is tied to a select few companies consuming the data. In all other cases you're stuck with the first three options.

Well, first of all the Bind configuration manual I'm looking at states that TTL is a signed 32-bit integer, expressed in seconds, giving it a theoretical maximum of 2^31. It says

Valid TTLs are of the range 0-2147483647 seconds.

Or approximately 68 years. So you probably cannot set it to 100 years in the first place.

So, let's say you set it to 68 years. It's pretty clear what would happen. DNS resolvers that respected the extremely long TTL on your DNS records would cache them for as long as they could. Some DNS resolvers don't respect TTLs at all and just implement their own caching policy however they wish.

The reason we can't put a single hard number on the maximums is because there are many different implementations of DNS created by many different vendors, and they all use slightly different variables. For instance, A DNS server running on Juniper JunOS will only go up to 604800 seconds, or 7 days, on the TTL.