Optimal value for Nginx worker_connections

Nginx worker_connections "sets the maximum number of simultaneous connections that can be opened by a worker process. This number includes all connections (e.g. connections with proxied servers, among others), not only connections with clients. Another consideration is that the actual number of simultaneous connections cannot exceed the current limit on the maximum number of open files". I have few queries around this:

What should be the optimal or recommended value for this?
What are the downsides of using a high number of worker connections?

Solution 1:

Let's take the pragmatic approach.

All these limits are things that were hardcoded and designed in the past century when hardware was slow and expensive. We're in 2016 now, an average wall-mart toaster can process more requests than the default values.

The default settings are actually dangerous. Having hundreds of users on a website is nothing impressive.

worker_process

A related setting, let's explain it while we're on the topic.

nginx as load balancer:

1 worker for HTTP load balancing.
1 worker per core for HTTPS load balancing.

nginx as webservers:

This one is tricky.

Some applications/frameworks/middleware (e.g. php-fpm) are run outside of nginx. In that case, 1 nginx worker is enough because it's usually the external application that is doing the heavy processing and eating the resources.

Also, some applications/frameworks/middleware can only process one request at a time and it is backfiring to overload them.

Generally speaking, 1 worker is always a safe bet.

Otherwise, you may put one worker per core if you know what you're doing. I'd consider that route to be an optimization and advise proper benchmarking and testing.

worker_connections

The total amount of connections is worker_process * worker_connections. Half in load balancer mode.

Now we're reaching the toaster part. There are many seriously underrated system limits:

ulimits is 1k max open files per process on linux (1k soft, 4k hard on some distro)
systemd limits is about the same as ulimits.
nginx default is 512 connections per worker.
There might be more: SELinux, sysctl, supervisord (each distro+version is slightly different)

1k worker_connections

The safe default is to put 1k everywhere.

It's high enough to be more than most internal and unknown sites will ever encounter. It's low enough to not hit any other system limits.

10k worker_connections

It's very common to have thousands of clients, especially for a public website. I stopped counting the amount of websites I've seen went down because of the low defaults.

The minimum acceptable for production is 10k. Related system limits must be increased to allow it.

There is no such thing as a too-high limit (a limit simply has no effect if there are no users). However a too-low limit is a very real thing that results in rejected users and a dead site.

More than 10k

10k is nice and easy.

We could set an arbitrary 1000kk limits (it's only a limit after all) but that doesn't make much practical sense, we never get that traffic and couldn't take it anyway.

Let's stick to 10k as a reasonable setting. The services which are going for (and can really do) more will require special tuning and benchmarking.

Special Scenario: Advanced Usage

Sometimes, we know that the server doesn't have much resources and we expect spikes that we can't do much about. We'd rather refuse users than try. In that case, put a reasonable connection limit and configure nice error messages and handling.

Sometimes, the backend servers are working good and well but only up to some load, anything more and everything goes south quickly. We'd rather slow down than have the servers crash. In that case, configure queuing with strict limits, let nginx buffer all the heat while requests are being drained at a capped pace.

Solution 2:

ulimit -a will tell you how many open files your system allows a process to use.

Also, net.ipv4.ip_local_port_range sets the total range of sockets to enable per IP.

So, your worker_connections cannot be more than any of those, or your client connections will queue until net.core.netdev_max_backlog - the total size of the TCP queue.

Keep in mind that if you're using nginx as reverse-proxy, that uses two sockets per connection. You might want to play a little bit with net.ipv4.tcp_fin_timeout and other kernel tcp related timeouts to try to switch state of sockets quickly. Another thing to take note is that each socket allocate memory of the TCP memory stack, you can also set some limits of the TCP memory stack using sysctl, you can put more pressure in the RAM as long as you have CPU and enough file handlers.

FYI it's possible given enough computing resources, to have one server with around 32GB ram and some virtual network interfaces to handle 1MM simultaneous connections with some kernel tuning using sysctl. During my tests when dealing with more than 1MM and sending a payload of around 700Bytes the server was taking around 10 seconds to update about 1.2MM simultaneous clients. Next was to increase the network bandwidth by bonding some extra NICs and ditching virtual nics. It's possible to achieve pseudo near real-time communication with more than 1.2MM clients, given the payload, bandwidth and reasonable time to update all clients.

Happy tuning!

Solution 3:

The appropriate sizing can be discovered through testing, as it is variable based on the type of traffic Nginx is handling.

Theoretically, nginx can handle: max clients = worker_processes * worker_connections (* =multiply) and worker_processes = number of processors

To find out processors, use: grep processor /proc/cpuinfo | wc -l

Solution 4:

Marcel's answer really needs to be upvoted! If ulimits are set to a default value of around 1k, max_connections should be set around the same value otherwise there is no benefit to setting max_connections to 10k.

You will get queued request and sockets closed on nginx if "your worker_connections cannot be more than any of those, or your client connections will queue until net.core.netdev_max_backlog - the total size of the TCP queue."

A single process can open as may connection as the ulimits allow. num_workers * max_connections is the formula but outside loadbalancer/proxy max connections and ulimits need to be taken into account for a reasonable values. Setting max_connection to a really high value may backfire as ulimits will be a limiting factor.