When is the thread pool used?
Solution 1:
Your understanding of how node works isn't correct... but it's a common misconception, because the reality of the situation is actually fairly complex, and typically boiled down to pithy little phrases like "node is single threaded" that over-simplify things.
For the moment, we'll ignore explicit multi-processing/multi-threading through cluster and webworker-threads, and just talk about typical non-threaded node.
Node runs in a single event loop. It's single threaded, and you only ever get that one thread. All of the javascript you write executes in this loop, and if a blocking operation happens in that code, then it will block the entire loop and nothing else will happen until it finishes. This is the typically single threaded nature of node that you hear so much about. But, it's not the whole picture.
Certain functions and modules, usually written in C/C++, support asynchronous I/O. When you call these functions and methods, they internally manage passing the call on to a worker thread. For instance, when you use the fs
module to request a file, the fs
module passes that call on to a worker thread, and that worker waits for its response, which it then presents back to the event loop that has been churning on without it in the meantime. All of this is abstracted away from you, the node developer, and some of it is abstracted away from the module developers through the use of libuv.
As pointed out by Denis Dollfus in the comments (from this answer to a similar question), the strategy used by libuv to achieve asynchronous I/O is not always a thread pool, specifically in the case of the http
module a different strategy appears to be used at this time. For our purposes here it's mainly important to note how the asynchronous context is achieved (by using libuv) and that the thread pool maintained by libuv is one of multiple strategies offered by that library to achieve asynchronicity.
On a mostly related tangent, there is a much deeper analysis of how node achieves asynchronicity, and some related potential problems and how to deal with them, in this excellent article. Most of it expands on what I've written above, but additionally it points out:
- Any external module that you include in your project that makes use of native C++ and libuv is likely to use the thread pool (think: database access)
- libuv has a default thread pool size of 4, and uses a queue to manage access to the thread pool - the upshot is that if you have 5 long-running DB queries all going at the same time, one of them (and any other asynchronous action that relies on the thread pool) will be waiting for those queries to finish before they even get started
- You can mitigate this by increasing the size of the thread pool through the
UV_THREADPOOL_SIZE
environment variable, so long as you do it before the thread pool is required and created:process.env.UV_THREADPOOL_SIZE = 10;
If you want traditional multi-processing or multi-threading in node, you can get it through the built in cluster
module or various other modules such as the aforementioned webworker-threads
, or you can fake it by implementing some way of chunking up your work and manually using setTimeout
or setImmediate
or process.nextTick
to pause your work and continue it in a later loop to let other processes complete (but that's not recommended).
Please note, if you're writing long running/blocking code in javascript, you're probably making a mistake. Other languages will perform much more efficiently.
Solution 2:
So I have an understanding of how Node.js works: it has a single listener thread that receives an event and then delegates it to a worker pool. The worker thread notifies the listener once it completes the work, and the listener then returns the response to the caller.
This is not really accurate. Node.js has only a single "worker" thread that does javascript execution. There are threads within node that handle IO processing, but to think of them as "workers" is a misconception. There are really just IO handling and a few other details of node's internal implementation, but as a programmer you cannot influence their behavior other than a few misc parameters such as MAX_LISTENERS.
My question is this: if I stand up an HTTP server in Node.js and call sleep on one of my routed path events (such as "/test/sleep"), the whole system comes to a halt. Even the single listener thread. But my understanding was that this code is happening on the worker pool.
There is no sleep mechanism in JavaScript. We could discuss this more concretely if you posted a code snippet of what you think "sleep" means. There's no such function to call to simulate something like time.sleep(30)
in python, for example. There's setTimeout
but that is fundamentally NOT sleep. setTimeout
and setInterval
explicitly release, not block, the event loop so other bits of code can execute on the main execution thread. The only thing you can do is busy loop the CPU with in-memory computation, which will indeed starve the main execution thread and render your program unresponsive.
How does Node.js decide to use a thread pool thread vs the listener thread? Why can't I write event code that sleeps and only blocks a thread pool thread?
Network IO is always asynchronous. End of story. Disk IO has both synchronous and asynchronous APIs, so there is no "decision". node.js will behave according to the API core functions you call sync vs normal async. For example: fs.readFile
vs fs.readFileSync
. For child processes, there are also separate child_process.exec
and child_process.execSync
APIs.
Rule of thumb is always use the asynchronous APIs. The valid reasons to use the sync APIs are for initialization code in a network service before it is listening for connections or in simple scripts that do not accept network requests for build tools and that kind of thing.