Cache updates when migrating DNS from one provider to another

This may be a Windows DNS specific question or a general DNS best practice question - I'm not sure!

We migrated our 3rd party DNS provision from provider A to provider B.

I noticed that our internal recursive windows DNS servers still had NS records cached for our domains pointing to provider A's servers, even though I changed the nameservers with our registrar several days ago, and even though selecting the properties of the cached records showed a TTL of 1 day.

After 24 hours when the NS records in this cache have expired, will the DNS server go back to the TLD server for an update on the authority, or will it go by preference to dns1.providera.com since that is what it has cached?

In this case I arranged to leave Provider A's servers up for a week to allow changes to propagate, so dns1.providera.com is still active and would still provide NS and SOA records that said that dns1.providera.com. was in charge of this domain. Given this fact, would the Windows DNS server ever go back to the TLD and pick up the authority changes, or would it just assume all was well and renew timestamps on its cached NS records?

I wonder what would be the best approach to ensuring that caches pick this up. Should I:-

(1) Leave Provider A's servers in place and active and wait for caches to catch up ... basically what we're doing now which seems to have issues - perhaps specifically for Windows servers, or perhaps more widely. (2) Leave Provider A's servers in place but change the NS and/or SOA information they provide to tell caches that new servers are in charge. (3) Remove Provider A's servers after 2*TTL to force remaining caches to update.

The issue with (2) is that on Provider A's system I can't seem to change the NS or SOA information to anything other than their servers.

The issue with (3) is that I'm not sure how a DNS server would behave in this case. When it couldn't reach the cached name servers, would it flush its cache and try a full recursive lookup, or would it just return an error, forcing the user to clear the cache manually?

Thanks in advance!


Solution 1:

The general architecture/flow of these updates is:

  • After you update the records with your registrar, they will update the registry database, which will in-turn update the stealth primary TLD NS.

  • Updates will flow from the stealth primary to the secondary servers that actually reply to queries. This happens in TLD SOA refresh time period, unless there are failures in which case the TLD SOA expire time period starts ticking.

  • If everything is hunky dory on their end, these updates propagate in a maximum of TLD SOA refresh and your updated record appears on the public facing TLD nameservers.

  • If you have queried before the updated record appeared on the public facing TLD nameservers, then you'll have to wait for the record's TTL to expire before you'll get the updated record.

In conclusion:

  • If all systems are go, then you only need to wait for a maximum time of TLD SOA refresh.

  • If you made your query via your caching/recursive too early, you may need to wait for TLD SOA refresh + record TTL

  • If there is an outage then you may need to wait for longer.

  • If systems come back at the last moment possible after an outage, you shouldn't need to wait for longer than TLD SOA expire + record TTL. This is accounting for the fact that you made the query before updated records got published to public facing TLD nameservers.

  • Because most caching/recursive servers will cache your zone's records as well, and your (enterprise?) DNS provider is in all likeliness going to have secondary servers as well, you'll have to add your SOA refresh before you start seeing changed to your own zone come through the new servers. Of-course, as I've done before, you could update both old and new servers for your own zone.

What you could do:

  • You could use a tool like dig or nslookup to query the public facing TLD nameservers directly to find out if your records have updated. You will also come to know the SOA temporal values of your TLD.

  • You could use the same tools to query your new DNS provider's secondary servers to find out if they have picked up the change.

  • Do a full recursive query via public nameservers (they can choose to ignore doing it recursively, but most don't) to see if the new query chain is working well.

  • Do a full recursive query locally from your client workstation. Dig allows you to do this and will assist you in determining if your resolution chain is bound as expected.

  • DNS can get daunting. Write comments to my response so I can make it more comprehensible. I'll look at it later in the day to see if I can improve on what I've written.