What is the difference between Load Balancer and Reverse Proxy?

I'm not clear about the difference between Load Balancer and Reverse Proxy. They both seems having same behavior: distributing incoming requests to backend servers.


Solution 1:

Your confusion is reasonable - they are often the same thing. But not always. When you refer to a load balancer you are referring to a very specific thing - a server or device that balances inbound requests across two or more web servers to spread the load. A reverse proxy, however, typically has any number of features:

  1. load balancing: as discussed above

  2. caching: it can cache content from the web server(s) behind it and thereby reduce the load on the web server(s) and return some static content back to the requester without having to get the data from the web server(s)

  3. security: it can protect the web server(s) by preventing direct access from the internet; it might do this through simple means by just obfuscating the web server(s) or it may have some more active components that actually review inbound requests looking for malicious code

  4. SSL acceleration: when SSL is used; it may serve as a termination point for those SSL sessions so that the workload of dealing with the encryption is offloaded from the web server(s)

I think this covers most of it but there are probably a few other features I've missed. Certainly it isn't uncommon to see a device or piece of software marketed as a load balancer/reverse proxy because the features are so commonly bundled together.

Solution 2:

Also, a reverse proxy is specific to web servers.

Load balancers however can deal with a lot of other protocols. While the web (HTTP) is the big idea nowadays, things like DNS, mail (SMTP, IMAP), etc. can be load balanced as well. It's just nowadays when most people think "Internet" or "IP network" they think of the web. There's a bunch more stuff out there that may be more obscure, or more of a niche.

Solution 3:

While the net result (distributing requests between servers) is the same between various load balancers and reverse proxies, the difference is in the method used to distribute the requests.

Some load balancers balance traffic using DNS, resolving the same name to different IPs in a round robin effectively redirecting requests. This can often be useful when load balancing requests between data centers or other physical locations. This is a poor choice if you need "instant" fail over, as you're at the mercy of your clients DNS server to honor the TTL you've provided. Cisco's GSS (Global Site Selector) is a good example of DNS based load balancing.

Other load balancers work by re-writing packet headers destined to a virtual IP to the real IP of a server in a farm. This provides real time load balancing and near instant fail over. An example of this would be Cisco's CSM (Content Switching Module)

Note that in both above examples, there is a TCP conversation between the client and the server.

A reverse proxy works by accepting the request on behalf of the web server then echoing that request to the web server and returning it to the client, optionally caching the results should a similar request follow.

Note that the client never actually establishes a connection to the web server; rather the conversation is strictly between the proxy and the client.