How 8.8.8.8 is kept *always* alive?
I know how you can manage datacenter redundancy if there's working DNS server that can point to any working site of your company - there's VRRP, multi WAN etc etc. But how DNS servers itself are kept online? It's first hit when someone connects to service and it can't really be provisioned. I mean for example 8.8.8.8
or 8.8.4.4
. I can't recall them being down. Ever. How do ISPs manage to keep such IPs always online?
I know it's probably really broad question but I'd like to hear just names of protocols / techniques that can be used for that. I can read details about them on my own.
Solution 1:
First of all, VRRP does not depend on DNS in any way. For redundancy within a single site you can run DNS servers on a shared VRRP address just fine.
But as others have mentioned in comments, the services also use anycast routing, which essentially means that the same IP address exists in multiple places around the world. When a whole site goes down, routes world-wide are recalculated so that your packets end up going to another working site.
A better example than Google's public DNS would be the root DNS servers – the ones which serve the .
zone and hold pointers to com
, org
, eu
, and so on – because they have a map of every instance of the 13 logical addresses. ICANN's "L" is served by 160 different sites!
Note that anycast has nothing to do with DNS-based round-robins (where the same name has multiple addresses). Anycast is done essentially by lying to the routing protocol.
The Internet uses BGP to exchange routing information between organizations.
BGP inherently supports selecting the best out of several routes towards the same network, based on various criteria. For example, the same customer might have redundant uplinks to the same ISP (announcing two routes differing only in weight/preference). Or the customer might have uplinks through several ISPs, and everyone will select their preferred route (mainly shortest AS-path) – that's the gist of "true" multi-WAN.
Multihoming
┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter--+ │
¦ │ ¦--DNSserver │
client 2 ---ISP---│--BGProuter--+ │
└──────────────────────────┘
However, BGP only leads the traffic to your entrance doors but does not care what happens beyond that. So if you internally set up both routes towards the same server, you get multihoming. But if each "entrance" leads to a different server (configured for the same IP), you get anycast.
Anycast... kind of?
┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter-----DNSserver │
¦ │ │
client 2 ---ISP---│--BGProuter-----DNSserver │
└──────────────────────────┘
Importantly, this also means that BGP doesn't care if the AS isn't contiguous at all. To get world-wide redundancy, just announce the same network from multiple physical locations – if you connect those locations together (so that they route that network to one place), you get multihoming; if they're islands, you get anycast.
Anycast
┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter-----DNSserver │
¦ └──────────────────────────┘
¦
¦ ┌────────[AS 65535]────────┐
client 2 ---ISP---│--BGProuter-----DNSserver │
└──────────────────────────┘
(For that matter, it doesn't even need to be the same AS – e.g. 6to4 relays are run by multiple independent organizations, each of them announcing their own route towards 192.88.99.0/24
.)
Caveats:
Anycast provides redundancy, but not load-balancing. Once BGP converges, each router will have chosen a single preferred route (or occassionally a few) and will continue using it until the network changes.
However, you cannot predict how long the routes will remain stable, so anycasting stateful services can be tricky. DNS gets away with it due to being stateless and using mainly UDP (EDNS reduced the need for TCP connections).
There must be coordination between the actual service and BGP router, so that the route is withdrawn if the service crashes.
See also "History of 4.2.2.2. What's the story?" on NANOG mailing list: post 1, post 2.