What is a typical method to scale out a software load balancer?

I often see web app architectures with a SLB / reverse-proxy in front of a bunch of app servers.

What happens when the number of connections to the SLB requires too many resources for a single SLB to handle effectively? For a concrete yet over-the-top example, consider 2 million persistent HTTP connections. Clearly a single SLB cannot handle this.

What is the recommended configuration for scaling out a SLB?

Is it typical to create a group/cluster of LBs? If so, how is client load spread out between the group of LBs?


Load balancers can't easily be scaled by other load balancers since there will inherently be a single load balancer on the chain somewhere maintaining the connections. That said, balancers such as LVS or HAProxy have absurd capacity in the Gbps range. Once you get beyond the capabilities of a single load balancer (software, hardware, whatever), then you'll need to move into other techniques such as round robin DNS.


OK, there is already an accepted answer, but there is something to add.. The most common 'classical' ways of scaling the load balancer tier are (in no particular order):

  • DNS Round Robin to publicize multiple IP addresses for the domain. For each IP address, implement a highly available server pair (2 servers cooperating on keeping one IP address working at all times.) Each IP corresponds to one load balancer cluster, either using appliances or servers with load balancing software. Scale horizontally by adding more load balancer pairs as needed.

  • Routing or firewall tweaks to spread load to multiple load balancers. Have the front router or front firewall spread the incoming connections to several IP addresses (each representing one load balancer pair) by hashing the source IP address, having multiple equal-cost routes to the load balancers, or similar.

  • A layer of IP level load balancers in front of a layer of HTTP level load balancers. IP-layer load balancing can be implemented in ASICs / silicon, and can be wicked fast for some things. Thus a single IP load balancer pair can often 'keep up' with a several HTTP/HTTPS level load balancers, and provide multi-gigabit performance levels while keeping the architecture nice and simple.

Going completely in-depth on the different ways doing the above would require a very long answer. But in general, it is not that hard to scale the load balancer tier, it is far harder to scale the application server tier and especially the database tier.

Whether you choose an appliance form factor (F5, Cisco, A10) or a generic server (Windows / Linux + software) matters less. The major considerations when scaling out the load balancer layer are:

  • State-full versus stateless. Do you absolutely need sticky sessions, or can you live without? Not keeping state makes everything simpler.
  • 'Hardware' (ASICs) versus 'software' (general purpose servers) for load balancing. Each has its pros and cons, see the HAProxy overview documentation linked above.
  • L3/4 (IP / TCP/IP) load balancing versus L7 (HTTP) load balancing. Again, pros and cons, the HAProxy doc provides a good overview.
  • SSL termination, where, on the webnodes or on the load balancer.

Generally, you don't need to worry about this before your website gets very large -- a single modern server with fx nginx will handle tens of thousands of plain HTTP requests per second. So don't do premature optimization, don't deal with this before you have to.


The key to scale an HTTP load balancing layer is to add another layer of lower-level (IP or TCP) load balancing first. This layer can be built entirely with open-source software, although you'll get better results if you have modern routers.

The flows (TCP sessions) should be hashed using headers such as source/destination IP and TCP ports, to decide which frontend they go to. You also need a mechanism to make sure that when a frontend dies, it stops getting used.

There are various strategies, I'm going to outline a couple that I've used in production on sites serving millions of users, so you can get the idea. It would be too long to explain everything in details but I hope this answer will give you enough information/pointers to get started. In order to implement these solutions you're going to need someone who is really knowledgeable about networking.

Admittedly what I'm describing here is much harder to implement than what is described in other answers, but this is really the state-of-the-art if you have a high-trafficked website with big scalability issues and availability requirements over 99.9%. Provided you already have a network engineer kinda guy onboard, it costs less to setup and run (both in capex and opex) than load balancer appliances, and it can be scaled further at almost no additional cost (vs. buying a new, even more expensive appliance when you outgrow your current model).

First strategy: with a firewall

Presumably you have a couple routers on which your ISP uplinks are connected. Your ISP provides 2 links (active/passive, using VRRP). On your routers, you also use VRRP, and you route the traffic going to your public network to a firewall. The firewalls (FW 1 and FW 2 below) also are also active/passive and will filter the traffic and send each flow to a healthy frontend server (your HTTP load balancers, FE 1 and FE 2 below).

      +--------------+       +--------------+
      | ISP router A |       | ISP router B |
      +--------------+       +--------------+
             |                      |
           ==#======================#==   (public network)
             |                      |
      +---------------+     +---------------+
      | Your router A |     | Your router B |
      +---------------+     +---------------+
             |                      |
           ==#=====#==========#=====#==   (RFC 1918 private network)
             |     |          |     |
       +------+ +------+  +------+ +------+
       | FW 1 | | FE 1 |  | FE 2 | | FW 2 |
       +------+ +------+  +------+ +------+

The goal is to have a flow look like this:

  1. ISP routes traffic going to your IPs to your active router.
  2. Your routers route the traffic to a VIP that uses an RFC 1918 address. This VIP is owned by the active firewall, much like VRRP. If you use OpenBSD for your firewall needs, then you can use CARP, a patent-free alternative to VRRP/HSRP.
  3. Your firewall applies the filter (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address").
  4. Your firewall also acts as a router and forwards the packets to a healthy frontend.
  5. Your frontend terminates the TCP connection.

Now the magic happens in steps 4 and 5, so let's see in more details what they do.

Your firewall knows the list of frontends (FE 1 and FE 2), and it will pick one of them based on a particular aspect of the flow (e.g. by hashing the source IP and port, among other headers). But it also needs to make sure that it's forwarding traffic to a healthy frontend, otherwise you will blackhole traffic. If you use OpenBSD, for instance, you can use relayd. What relayd does is simple: it health-checks all your frontends (e.g. by sending them a probe HTTP request), and whenever a frontend is healthy it adds it to a table that the firewall uses to select the next hop of the packets of a given flow. If a frontend fails health checks, it is removed from the table and no packets are sent to it anymore. When forwarding a packet to a frontend, all the firewall does is swapping the destination MAC address of the packet to be that of the frontend chosen.

In step 5, the packets from the user are received by your load balancer (be it Varnish, nginx, or whatever). At this point, the packet is still destined to your public IP address so you need to alias your VIP(s) on the loopback interface. This is called DSR (Direct Server Return), because your frontends terminate the TCP connection and the firewall in between only sees simplex traffic (only incoming packets). Your router will route outgoing packets directly back to the ISP's routers. This is especially good for HTTP traffic because requests tend to be smaller than responses, sometimes significantly so. Just to be clear: this isn't an OpenBSD specific thing and is widely used in high-trafficked websites.

Gotchas:

  • End users will directly connect to your frontend servers because you use DSR. Maybe it was already the case, but if it was not, you need to make sure they're adequately secured.
  • If you use OpenBSD, beware that the kernel is single threaded so the performance of a single CPU core will limit the throughput of a firewall. It might be a problem depending on your type of NIC and the packet rate you're seeing. There are ways to solve this problem (more on this below).

Second strategy: without a firewall

This strategy is more efficient but harder to setup because it depends more on the specifics of the routers you have. The idea is to bypass the firewall above and have the routers do all the work the firewalls were doing.

You'll need routers that support per-port L3/L4 ACLs, BGP and ECMP, and Policy Based Routing (PBR). Only high-end routers support these features, and they often have extra licensing fees to use BGP. This is typically still cheaper than hardware load balancers, and is also far easier to scale. The good thing about these high-end routers is that they tend to be line-rate (e.g. they can always max out the link, even on 10GbE interfaces, because all the decisions they make are done in hardware by ASICs).

On the ports on which you have your ISP uplinks, apply the ACL that used to be on the firewall (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address"). Then have each one of your frontends maintain a BGP session with your router. You can use the excellent OpenBGPD (if your frontends are on OpenBSD) or Quagga. Your router will ECMP the traffic to the frontends that are healthy (because they're maintaining their BGP sessions). The router will also route the traffic out appropriately using PBR.

Refinements

  • With the firewall pair solution, it's nice if you can synchronize the TCP states across the firewalls, so that when one firewall fails, everything fails over smoothly to the other one. You can achieve this with pfsync.
    • Bear in mind that pfsync will typically double the packet rate on your firewalls.
    • HTTP is a stateless protocol, so it's not the end of the world if you reset all the connections during a firewall failover because you don't use pfsync.
  • If you outgrow a single firewall, you can use ECMP on your router to route your traffic to more than one pair of firewall.
  • If you use more than one pair of firewall, you might as well make them all active/active. You can achieve this by having the firewalls maintain a BGP session with the routers, much like the frontends need to maintain one in the 2nd design without firewalls.

Sample relayd config

See also HOWTO at https://calomel.org/relayd.html

vip="1.2.3.4"  # Your public IP address
               # (you can have more than one, but don't need to)
fe1="10.1.2.101"
fe2="10.1.2.102"
fe3="10.1.2.103"
fe4="10.1.2.104"  # You can have any number of frontends.
int_if="em0"
table <fe> { $fe1 retry 2, $fe2 retry 2, $fe3 retry 2, $fe4 retry 2 }
table <fallback> { 127.0.0.1 }

redirect webtraffic {
        listen on $vip port 80
        session timeout 60
        route to <fe> check http "/healthcheck.html" digest "(the sha1sum of healthcheck.html)" interface $int_if
}