why no examples of horizontally scalable software load balancers balancing ssl?
I have a bunch of questions with respect to ssl, local sessions, and load balancing which seem to be interconnected, so I apologize in advance for the length of this question.
I have a website which uses file-based sessions. The nature of the site is that most of it is http, but some sections are ssl. Currently, because of the file based sessions, it is necessary for any ssl requests to hit the same server as any previous http requests.
Because of time constraints, I want to do the easiest possible thing to load balance increased http and ssl traffic.
There seems to be 2 options for sticky load balancing algorithms:
- ip based
- cookie based
The ip based solution will probably work, but the hashing algorithm will potentially change the server a user goes to when a server goes down or is added which is undesirable with the current file-based session setup. I also suppose that it is technically possible for a user to legitimately change ips while browsing a website.
The cookie based algorithm seems better, but the inability to inspect the cookie when encrypted by ssl seemingly presents its own problems.
I have been googling for examples on how to load balance ssl, and I cannot seem to find any explicit examples of setups which can do cookie based load balancing AND which can deal with increased ssl load by adding another ssl decoder.
Most of the explicit examples I've seen have the ssl decoder (usually hardware, apache_mod_ssl, or nginx) sitting between the browser client and the load balancer. The examples usually seem to have something like this (modified from http://haproxy.1wt.eu/download/1.3/doc/architecture.txt):
192.168.1.1 192.168.1.11-192.168.1.14 -------+-----------+-----+-----+-----+ | | | | | +--+--+ +-+-+ +-+-+ +-+-+ +-+-+ | LB1 | | A | | B | | C | | D | +-----+ +---+ +---+ +---+ +---+ apache 4 cheap web servers mod_ssl haproxy
The ssl decoding part in the above example seems to be a potential bottleneck that is not horizontally scalable.
I've looked at haproxy, and it seems to have a 'mode tcp' option that would allow something like this, which would allow you to have multiple ssl decoders:
haproxy | ------------- | | ssl-decoder-1 ssl-decoder2 | | ------------------- | | | web1 web2 web3
However, in such a setup, it appears you would lose the client ip because haproxy is not decoding the ssl: https://cloud-support.engineyard.com/discussions/problems/335-haproxy-not-passing-x-forwarded-for
I've also looked at nginx, and I also do not see any explicit examples of horizontally scalable ssl-decoders. There seem to be many examples of people having nginx as a potential bottleneck. And at least this link seems to suggest that nginx doesn't even have the option of the haproxy-like setup where you would lose the ip by saying that nginx "doesn't support transparently passing TCP connections to a backend" How to pass Apache SSL traffic trough nginx proxy? .
Questions:
- Why don't there seem to be more examples of setups adding more ssl decoders to deal with increased traffic?
- Is it because the ssl decoding step is only a theoretical bottleneck, and practically speaking, one decoder will essentially be enough except for sites with ridiculous traffic?
- Another possible solution that comes to mind is perhaps anybody with such increased ssl needs also has a centralized session store, so it doesn't matter which webserver the client hits on sequential requests. Then you could enable mod_ssl or equivalent on every webserver.
- The haproxy solution cited above seems to work (besides the client IP problem), but has anyone come across a sticky cookie based software load balancer solution that would work by increasing the number of decoders while keeping the client IP, or is that perhaps technically not possible (because you have to decode the request to get the client IP, in which case, we have a decoder bottleneck).
Assuming that everything I've said is true, these appear to be my options:
- use ip hashing (bad for users who potentially legitimately change ips, and for server adding and dropping scenarios)
- use nginx or mod_ssl as the 1st program touching the ssl request, this will be a potential ssl decoding bottleneck
- use haproxy as the 1st program touching the ssl request, gaining horizontal ssl scalability, but live with no ips logged at the webserver level for ssl requests (probably temporarily ok)
- over the longer term, move towards a mobile or centralized session store, making sticky sessions unnecessary
Solution 1:
The "simplest thing", in all seriousness, is to move to a centralised session store. You've got to setup all this plumbing with load balancers, haproxy, SSL, and the rest of it, when every bit of session-handling code I've ever seen makes it near-trivial to plug in different storage engines, so a bit of code and very, very little extra complexity solves all your problems.
Solution 2:
womble is right about the shared session store making things much easier all around. In addition to his answer, I can expand a bit on the load balancing parts of the question:
Why don't there seem to be more examples of setups adding more ssl decoders to deal with increased traffic?
Modern multi-core PC's can do several thousand SSL transactions per second. And if that becomes a bottleneck then a dedicated appliance from F5, Citrix, Cisco or the like can be even faster. So most sites never outgrow a good single-device SSL & load balancing solution.
Assuming that everything I've said is true, these appear to be my options:
There are options for scaling SSL decryption horizontally, if you come to need this.
The common approach is to use DNS Round Robin to highly available SSL accelerator pairs, i.e. publishing multiple IP addresses for the domain, each IP address pointing to a pair of SSL accelerators.
In this case you could worry about DNS TTL timing out in the middle of a user session, bumping the user to another application server. That should not be a common occurrence, but it could happen. A shared session store is the common solution, but it can be handled in other ways.
As one example you could separate the SSL decryption from the application server balancing. SSL handling is more CPU intensive than basic load balancing, thus a single load balancer should be able to saturate a couple of SSL accelerators. Like this:
Internet --> DNS round robin to multiple SSL accelerators --> plain HTTP to a single HTTP load balancer --> plain HTTP to multiple application servers
As mentioned in the beginning, a shared session store simplifies many things, and is almost certainly a better long-term solution than putting lots of complexity into your SSL / load balancing layer.