What is the difference between Workers and Threads in Puma
As the other answer states, this Heroku article is pretty good with explanations of certain configuration items.
However if you need to tune your application on Heroku, or anywhere, then it pays to know how things work.
I think you are almost correct when you say "a worker is a thread inside the puma process", I believe a worker is an operating system level process forked from puma which then can use threads internally.
As far as I understand - puma will fork its operating system process however many times you set via workers
configuration to respond to http requests. This gives you parallelism in terms of handling multiple requests but this will usually take up more memory as it will 'copy' your application code for each worker.
Each puma worker will then use multiple threads within its OS process depending on the threads
configuration. These add concurrency by allowing the puma process to respond to multiple requests itself so that if one thread is blocked, ie processing a request, it can handle a new request with another thread. As stated, this requires your entire application to be threadsafe so that, for example any global configuration from one request does not 'leak' into another.
You would tune puma so that the number of workers was adequate for the number of CPUs and memory available and then tune the threads dependant on how much you would want to saturate the host running your application and how your application behaves - more does not always equal faster/more request throughput.
This is a big area and I am not an expert, however...
Puma can spawn many workers, and each worker can use many threads to process the request.
Unicorn does not have threads as far as I know, it just has the worker model.
If you use threads though, you need to make sure that your code is thread safe. This means Rails, any gem you rely on, and your own code.
For maximum performance, you might also want to look into JRuby or Rubinius which have proper thread support. MRI is restricted by its GIL.
There is a good article on Heroku which explains how Puma uses workers and threads. You should probably read that and ignore me :)
I just want to emphasize the most important line of the Heroku/Puma article that was referenced here:
Rails maintains its own database connection pool, with a new pool created for each worker process. Threads within a worker will operate on the same pool.
It states that each Worker will have its own Pool. However:
Threads within a worker will operate on the same pool.
This is very important to understand. If a Puma Worker utilizes 5 threads per worker, then the database.yml must be configured to a connection pool of 5, since each thread could possibly establish a database connection.
Since each Worker is spawned by a system fork(), the new worker will have its own set of 5 threads to work with, and thus for the new Rails instance created, the database.yml will still be set to a connection pool of 5.
Now the database.yml connection pool and your actual database pool are two different things. The TOTAL connections to your database will need to use a specific formula that the Heroku docs mention:
A good formula for determining the number of connections each application will require is to multiply the RAILS_MAX_THREADS by the WEB_CONCURRENCY.
What this means is if you are using 2 Workers, each with 5 threads, then 2 * 5 = 10, so your database must be configured to accept 10 concurrent connections.