How can I balance incoming web traffic amongst N apache servers?

Solution 1:

Just about any "reverse proxy" will do what you ask.

For example Varnish, Pound and HAProxy are all good at what they do, but they also have their differences - however, for what you are asking, any of them will do. Personally, I would think you'd be best off with HAProxy, but that's just a guess.

You might be best off reading an article about load balancers to help you decide what kind you need: http://1wt.eu/articles/2006_lb/

Also, you might consider using a pre-built service for this - like running your software on Amazon's Elastic Compute Cloud and using their Elastic Load Balancing.

Solution 2:

At first, there is an important question that must be answered:
do you need the user sessions to be handled by the load-balancer(s) and always driven to the same web server (if alive)?

  • sessions not required: in this case, you should use the efficient nginx program as a load balancer. The configuration is easy to set, where you basically only have to indicate the list of web servers in an upstream upstream_name { server1, ..., serverN } statement, then, for a given domain, you need a simple proxy_pass upstream_name directive.
    See Nginx wiki.

  • session required there is a similar setting for pound where you indicate the name of the cookie that will host the session ID (ID MYCOOKIENAME), then a list of BACKEND for all your servers.
    See for instance Pound setup exemple.

When the need for several load balancers arises, you may want to go for a heartbeat configuration, that either will ensure only one balancer mounts the virtual IP for a given domain, (if sessions required, or mount both and feed DNS with two IP addresses for instance). Maybe this should be detailed in another question at the time it becomes necessary (as the tools evolve quickly).
See also this link for instance.

Solution 3:

You should need a very good reason for introducing additional complexity and a single point of failure into your architecture.

Round-Robin load balancing

  • costs nothing
  • is simple to implement and manage
  • implements failover at the client - the only place that failure can be reliably detected
  • implicitly supports server-affinity but still allows failover without the problems of session management associated with sticky sessions
  • requires no additional software / hardware / configuration on cluster nodes

It amazse me the amount of mis-information that is put about regarding round-robin. If I were a cynical person I might wonder whether there is any connection with the vendors whoi produce big expensive load-balancing hardware.

The only points I will concede is that

  1. IPV4 addresses are becoming scarce and therefore expensive - but still much. much cheaper than say a Cisco CSS.

  2. Increasingly the internet runs on web-services - and not all developers implement DNS support according to the specs. But every browser I've ever used works as it should