RHEL Nginx SSL versus non SSL performance huge difference.

I'm in the process of setting up an Nginx 1.8 reverse proxy.

In short -

Serving HTML content HTTP traffic is up to 50x faster than HTTPS.

Serving ProxyPass HTTP traffic is up to 7x faster than HTTPS.

OS is RHEL7

Hardware:

2 core VMWare Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz
cpu MHz         : 1897.802
cache size      : 15360 KB
bogomips        : 3795.60
1 Gbit network card

Benchmarking client is Apache bench, 1 hop away, ping 1ms. Apache bench uses the following TLS protocol when running:

TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Server SSL certificate 2048-bit RSA. OCSP Stapling is on and verified.

/etc/sysctl.conf has

net.ipv4.ip_local_port_range=1024 65000
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=15
net.core.netdev_max_backlog=4096
net.core.rmem_max=16777216
net.core.somaxconn=4096
net.core.wmem_max=16777216
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_max_tw_buckets=400000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_wmem=4096 65536 16777216
vm.min_free_kbytes=65536

/etc/security/limits.conf has

nginx   soft    nofile  65536
nginx   hard    nofile  65536

Nginx is configured with

server {
  listen 443 ssl deferred backlog=1024;
  listen 80 deferred backlog=1024;

  server_name SERVERNAME;

  client_max_body_size 10m;

  ssl_stapling on;
  ssl_stapling_verify on;
  ssl_trusted_certificate path_to_/certificateAndChain.cer;
  ssl_certificate path_to_/certificateAndChain.cer;
  ssl_certificate_key path_to_/private.key;
  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
  ssl_ciphers "EECDH+AES:EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES128-GCM-SHA256:AES128+EECDH:D$
  ssl_prefer_server_ciphers on;
  ssl_session_cache shared:SSL:32m;
  ssl_session_timeout 1m;

  #resolver 8.8.8.8 8.8.8.4 valid=1m;
  #resolver_timeout 5s;

  location / {
   proxy_pass_header Server;
   proxy_set_header Host $http_host;
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $remote_addr;
   proxy_set_header X-Scheme $scheme;
   proxy_connect_timeout 43200000;
   proxy_read_timeout 43200000;
   proxy_send_timeout 43200000;
   proxy_buffering off;
   proxy_http_version 1.1;
   proxy_set_header Connection "";

   proxy_pass http://IPADDRESS/;

  }

  location /localtest {
    root /var/www/localtest;
    index index.html;
  }
}

Actual results:

Serving local HTML content HTTP

ab -c200 -n20000 http://SERVERNAME/localtest/index.html
Requests per second:    12751.64 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4   2.3      4      11
Processing:     2   12   7.3      9      96
Waiting:        1   10   7.7      7      96
Total:          2   16   6.6     14     100

HTTPS:

Requests per second:    252.28 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       12  651 288.1    694    1470
Processing:     0  141 134.4    101    1090
Waiting:        0  101 124.3     65    1089
Total:         15  792 276.7    809    1641

Proxying to Apache, 1ms ping, 1 hop away.

HTTP

Requests per second:    1584.88 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   2.3      1       8
Processing:     4  141 309.6     30    1244
Waiting:        4  141 309.7     29    1244
Total:         10  143 310.3     31    1248

HTTPS

Requests per second:    215.99 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       14 1131 622.3   1137    2030
Processing:     4  474 413.2    313    1814
Waiting:        1  399 405.6    257    1679
Total:         26 1605 769.6   1699    3306

Benchmarks are lies, don't reflect the reality but might be a useful tool to detect bottlenecks. But you have to understand the benchmarks. Given that you omit essential details needed to understand the benchmark results it might be that you don't really understand what might affect the results of the benchmark.

Especially information about the size of the test payload and detailed CPU load information for server and client are missing. Thus it might be that you are reaching CPU limits on the client or on the server already. It might also mainly be a problem of the more round trips you need for the requests. Let's explain the aspects of HTTP vs. HTTPS in more detail:

ab -c200 -n20000 http://SERVERNAME/localtest/index.html

You've configured to use 200 concurrent requests. The size of the request is unknown so we can assume that there will be only minimal payload. You are also using no HTTP keep alive which means that there will a new TCP connection for each request. I doubt that apache bench is doing TLS session resume so that there will a full handshake each time. Which gives you:

  • HTTP: 1 RTT for the TCP handshake and another RTT for a minimal HTTP request and response. This might also include the connection close already (implementation dependent). This means 2 RTT and minimal data transfer.
  • HTTPS adds on top of this:

    • 2 RTT for a full TLS handshake and probably also 1 RTT for the TLS shutdown. Just because of a total of 5 RTT for HTTPS vs. 2 RTT for plain HTTP you should see a large drop in performance, i.e. from about 13000 req/s to 5200 req/s (i.e. 2/5).
    • The transferred data for TLS handshake alone might even be larger than what you have as payload within your simple HTTP only request (Edit: based on the commend the size of your responses vary from 60 byte to 50kb so this is probably not that relevant).
    • But then you have also lots of computations for the TLS handshake, both on the client and on the server side. And more of this because you are using ECDHE, see https://securitypitfalls.wordpress.com/2014/10/06/rsa-and-ecdsa-performance/.

The computations during the TLS handshake need lots of CPU time and that's why it would have been important to provide information about the CPU load. It might be that you are simply hitting the maximum the CPU could do, either at the server or at the client. Please also note that apache bench is single threaded, so it would be enough to max out the performance of a single CPU core even if the others are idle. And even if you use multiple thread the computation still takes time. Using openssl speed does not reflect what is really done inside the TLS handshake and it also tests only the maximum speed with a single thread, not with multiple computations in parallel and all the cache trashing etc involved.

Thus while this might be an interesting benchmark to see what is possible it does not reflect reality in most cases. The fact is that TLS can reduce performance a lot, but with common HTTP traffic you will have larger requests, HTTP keep alive and TLS session reuse which all reduce the impact of the costly TLS handshake.

But if the benchmark is actually limited on the server performance and not on the client performance the setup might reflect servers used for tracking, where you might have only a small response (i.e. 1x1 pixel) from lots of different sites without any kind of TLS session reuse or HTTP keep alive.


The FIRST https request is really slower because of the TLS negotiation, and your benchmark only test that.

A real life client will make a lot of request (one for the html page, and several for js/css/images).

With TLS session tickets, that TLS negotiation is skipped after the first request.

Until the expiration of the session ticket, https requests will be a just a little slowler that http. But if you use SPDY or HTTP2, then https if will be FASTER that http.