Terminating a high volume of SSL connections cost effectively on EC2
I have recently set up a Node.js based web socket server that has been tested to handle around 2,000 new connection requests per second on a small EC2 instance (m1.small). Considering the cost of a m1.small instance, and the ability to put multiple instances behind a WebSocket capable proxy server such as HAProxy, we are very happy with the results.
However, we realised we had not done any testing using SSL yet, so looked into a number of SSL options. It became apparent that terminating SSL connections at the proxy server is ideal because then the proxy server can inspect the traffic and insert headers such as X-Forward-For so that the server knows which IP the request came from.
The SSL termination solutions I looked at where Pound, stunnel and stud, all of which allowed incoming connections on 443 to be terminated, and then passed onto HAProxy on port 80, which in turn passes the connection onto the web servers. Unfortunately however, I found that sending traffic to the SSL termination proxy server on a c1.medium (High CPU) instance very quickly consumed all CPU resources, and only at a rate of 50 or so requests per second. I tried using all three of the solution listed above, and all of them performed roughly the same as I assume under the hood they all rely on OpenSSL anyway. I tried using a 64 bit very large High CPU instance (c1.xlarge) and found that performance only scale linearly with cost. So based on EC2 pricing, I'd need to pay roughly $600p/m for 200 SSL requests per second, as opposed to $60p/m for 2,000 non SSL requests per second. The former price becomes economically unviable very quickly when we start planning to accept 1,000s or 10,000s of requests per second.
I also tried terminating the SSL using Node.js' https server, and the performance was very similar to Pound, stunnel and stud, so no clear advantage to that approach.
So what I am hoping someone can help with is advising how I can get around this ridiculous cost we have to absorb to provide SSL connections. I have heard that SSL hardware accelerators provide much better performance as the hardware is designed for SSL encryption and decryption, but as we are currently using Amazon EC2 for all of our servers, using SSL hardware accelerators is not an option unless we have a separate data centre with physical servers. I am just struggling to see how the likes of Amazon, Google, Facebook can provide all their traffic over SSL when the cost of this is so high. There must be a better solution out there.
Any advice or ideas would be greatly appreciated.
Thanks Matt
Solution 1:
Firstly, good on you for benchmarking to start. My instinct from there makes me wonder what key size you're using. It seems to me you should be able to terminate far more than 200 connections per second. If you're using a key size larger than 1024, know that the performance drops off very quickly.
If you're using a smaller key and still running into issues, I'd take a strong look at the GPU offerings that EC2 has to offer. SSLShader might be a cost-effective change-over after a certain number of connections per second.
Also, investigating @ceejayoz's mention of Elastic Load Balancer has merit.
Solution 2:
You're possibly doing the benchmarking wrong. I doubt you're really expecting 200 unique new SSL visitors every second ? If any of those connections are re-connections from people who recently visited, you should be using SSL caching - this kind of thing:
server.on('newSession', function(id, data) { tlsSessionStore[id] = data; });
server.on('resumeSession', function(id, cb) { cb(null, tlsSessionStore[id] || null); });
And, of course, your benchmark needs to present itself in your tests as the correct proportion of virgin new connections and resumed/reused sessions as makes sense for your application.
Also - the ciphers you choose and key sizes, as mentioned earlier, probably also play roles in the speed.