Do clients typically implement failover/load-balancing on multiple A records?

Typically, load balancers like Amazon's Elastic Load Balancers use a DNS record set with multiple A records to provide multiple load balancer instances which can handle traffic to requesting endpoints:

$ dig +short my-fancy-elb.us-east-1.elb.amazonaws.com
10.0.1.1
10.0.1.2

If I attempt to curl this URL in verbose mode, I notice that curl seems to round-robin attempts to the two IP addresses:

$ curl -ivs http://my-fancy-elb.us-east-1.elb.amazonaws.com | grep -i 'connected'
* Connected to my-fancy-elb.us-east-1.elb.amazonaws.com (10.0.1.1)
$ curl -ivs http://my-fancy-elb.us-east-1.elb.amazonaws.com | grep -i 'connected'
* Connected to my-fancy-elb.us-east-1.elb.amazonaws.com (10.0.1.2)

Is the fact that curl does round-robin on the A records described in the record set done by the curl binary itself or is it something that the Linux kernel does for it?

TCP exists at layer 4 and DNS exists at layer 7, so I'd imagine that individual binaries and libraries would have to implement their own load-balancing and failover: fetching the DNS record set for the given domain name and choosing a TCP address to connect to from that set.

Can I reasonably expect that programming languages, browsers, and libraries like curl will do load-balancing and failover on A records for me?


Solution 1:

The short answer is that it varies.

When multiple address records are present in the answer set, a queried DNS server normally returns them in a randomized order. The operating system will typically present the returned record set to the application in the order they were received. That said, there are options on both sides of the transaction (the nameserver and the OS) which can result in different behaviors. Usually these are not employed. As an example, a little-known file called /etc/gai.conf controls this on glibc based systems.

The Zytrax book (DNS for Rocket Scientists) has a good summary on the history of this topic, and concludes that RFC 6724 is the current standard that applications and resolver implementations should adhere to.

From here it's worth noting a choice quote from RFC 6724:

   Well-behaved applications SHOULD NOT simply use the first address
   returned from an API such as getaddrinfo() and then give up if it
   fails.  For many applications, it is appropriate to iterate through
   the list of addresses returned from getaddrinfo() until a working
   address is found.  For other applications, it might be appropriate to
   try multiple addresses in parallel (e.g., with some small delay in
   between) and use the first one to succeed.

The standard encourages applications to not stop at the first address on failure, but it is neither a requirement nor the behavior that many casually written applications are going to implement. You should never rely solely on multiple address records for high availability unless you are certain that the greater (or at least most important) percentage of your consuming applications will play nicely. Modern browsers tend to be good about this, but remember that they are not the only consumers that you are dealing with.

(also, as @kasperd notes below, it's important to distinguish between what this buys you in HA vs. load balancing)

Solution 2:

My guess what happens is that the DNS TTL for the record is set really low and curl just needs to resolve again every time and will get another IP from the DNS server.

Neither curl nor the kernel are at all aware that this DNS level load balancing happens and you can't reasonably expect anything like that.

Solution 3:

The basic thing is DNS servers usually cycle the records in a pseudorandom fashion.

fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
206.190.36.45
98.138.253.109
98.139.183.24
fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
98.139.183.24
206.190.36.45
98.138.253.109
fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
98.139.183.24
98.138.253.109
206.190.36.45

In the case of curl, it has it's own DNS resolving library which respects the server presented order.

There is a story on this topic on https://daniel.haxx.se/blog/2012/01/03/getaddrinfo-with-round-robin-dns-and-happy-eyeballs/. The curl's implementation is mentioned there too.

Solution 4:

Is the fact that curl does round-robin on the A records described in the record set done by the curl binary itself or is it something that the Linux kernel does for it?

Neither. Its the DNS server which changes the IP address usually. The curl library needs to resolve the host-name to get the IP address for each request. It sends the request to the DNS server which sends back a list of IP addresses. The DNS server can also be local on the same machine for caching. Most of the DNS server rotate the IP list round-robin in every request. Thus you get a different IP in every request as the top IP of the list has changed. If you ping www.google.com from a linux machine you will likely see different address each time.

Do clients typically implement failover/load-balancing on multiple A records?

I performed a test with curl to fetch a file over http. Curl is able to retry with another IP when the first ip is not accessible (failover). So 'failover' is working with curl for http request.