AWS Route 53 alias-record change takes too long
I have setup a stack in AWS Cloudformation, which is up and running. The stack contains an ELB (load balancer) and a few EC2-instances. When we do a new deployment of our application, we build a new environment and delete the old one. Therefore, we have to update the DNS-record in Route 53. I am wondering how long the old stack should stay alive.
I have followed the AWS docs to use a subdomain without migrating the parent domain. The parent domain NS-records (towards AWS DNS) have a TTL of 3600
. Inside Route 53, I have setup an A-record with an alias to the load balancer in the stack (I cannot set a TTL here).
I cannot enter a TTL for the alias-record from the AWS console. Some sources however, say that the change might take up to 60 seconds.
I just did some tests locally, to check how long it takes for the DNS to pick up the new stack. This is the time between updating the alias DNS record in Route 53 and being able to reach the new stack in my browser:
- Try #1: ~4 minutes
- Try #2: ~9 minutes
- Try #3: ~7 minutes
- Try #4: ~15 minutes
Shouldn't this be under 60 seconds? What is the maximum time this can take for all clients? Is it possible to reduce this time? What is a safe time to delete the old stack?
Solution 1:
Firstly, it's important to recognise that the DNS records cached on the client or their DNS resolver which are both out of your control (note that I'm referring to the DNS records not your authoritative name server). Therefore it's up the client and their DNS resolver to honour your TTL.
In the case of a new visitor who has never visited your site before and whose DNS resolver has not cached your records (or visited long enough ago that the cache has expired) they will see the new records immediately.
Shouldn't this be under 60 seconds?
It should, but that's only if your client honours the TTL. Some clients have minimum TTL's and some networks also have a DNS resolver which may be caching results.
Is it possible to reduce this time?
You have to remember that the majority of your visitors (assuming this is a public site) won't have been sitting there loading your site every few seconds like you have. Most of your visitors probably won't have visited the site recently and and their DNS resolver may not have the records in their cache. Most DNS resolvers should respect your TTL but you can't guarantee this.
What is a safe time to delete the old stack?
You're better off judging this by what traffic is still being served by the old stack rather than DNS TTL. If you're using ELB you should be able to view how many requests per second are being served by the old ELB in cloudwatch. Wait until this drops below an acceptable level then delete it.
For the sake of viewing the new stack immediately after the switch I recommend just manually flushing your local DNS cache. Leaving your own client to expire records naturally for the sake of seeing how long it takes probably won't be indicative of how long it takes for other clients.
Edit, I noticed Google's Public DNS has a tool to let you flush the cache:
https://developers.google.com/speed/public-dns/cache
This may speed things up as a significant proportion of clients are likely to be using this.