Load Balancing Best Practices for Persistence

Solution 1:

The canonical solution to this is to not rely on end user IP address, but instead use a Layer 7 (HTTP/HTTPS) load balancer with "Sticky Sessions" via a cookie.

Sticky sessions means the load balancer will always direct a given client to the same backend server. Via cookie means the load balancer (which is itself a fully capable HTTP device) inserts a cookie (which the load balancer creates and manages automagically) to remember which backend server a given HTTP connection should use.

The main downside to sticky sessions is that beckend server load can become somewhat un-even. The load balancer can only distribute load fairly when new connections are made, but given that existing connections may be long-lived in your scenario, then in some time periods load will not be distributed entirely fairly.

Just about every Layer 7 load balancer should be able to do this. On Unix/Linux, some common examples are nginx, HAProxy, Apsis Pound, Apache 2.2 with mod_proxy, and many more. On Windows 2008+ there is Microsoft Application Request Routing. As appliances, Coyote Point, loadbalancer.org, Kemp and Barracuda are common in the low-end space; and F5, Citrix NetScaler and others in high-end.

Willy Tarreau, the author of HAProxy, has a nice overview of load balancing techniques here.

About the DNS Round Robin:

Our intent was for the Round Robin DNS TTL value for our api.company.com (which we've set at 1 hour) to be honored by the downstream caching nameservers, OS caching layers, and client application layers.

It will not be. And DNS Round Robin isn't a good fit for load balancing. And if nothing else convinces you, keep in mind that modern clients may prefer one host over all others due to longest prefix match pinning, so if the mobile client changes IP address, it may choose to switch to another RR host.

Basically, it's okay to use DNS round robin as a coarse-grained load distribution, by pointing 2 or more RR records to highly available IP addresses, handled by real load balancers in active/passive or active/active HA. And if that's what you're doing, then you might as well serve those DNS RR records with long Time To Live values, since the associated IP addresses are highly available already.

Solution 2:

To answer your question about alternatives: You can get solid layer-7 load balancing through HAProxy.

As far as fixing the LVS affinity issues, I'm a bit dry on solid ideas. It could be as simple as a timeout or overflow. Some mobile clients will switch IP addresses while they're connected to the network; perhaps this may be the source of your woes? I would suggest, at the very least, that you spread the affinity granularity out to at least a class C.