Performance of NodeJS with large amount of callbacks

I am working on a NodeJS application. There is a specific RESTful API (GET) that, when triggered by the user, it requires the server to do about 10-20 network operations to pull information from different sources. All these network operations are async callbacks, and once they ALL are finished, the result is consolidated by the nodejs app and sent back to the client. All these operations are started in parallel via async.map function.

I just want to understand, since nodejs is single threaded, and it does not make use of multi-core machines (at least not without clustering), how does node scale when it has many callbacks to process? Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?

The reason why I ask is, I see the performance of my 20 callbacks deteriorate from the first callback to the last one. For example, the first network operation (out of the 10-20) takes 141ms to complete, whereas the last one takes about 4 seconds (measured as the time from when the function is executed, until the callback of the function returns a value or an error). They are all the same network operation hitting the same data source, so the data source is not the bottleneck). I know for a fact that the data source takes no more than 200ms to respond to a single request.

I found this thread, so it looks to me that the one single thread needs to address all callbacks AND new requests coming up.

So my question is, for operations that will trigger many callbacks, what is the best practice in optimizing their performance?


For network operations node.js is effectively single threaded. However there is a persistent misunderstanding that handling I/O requires constant CPU resource. The core of your question boil down to:

Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?

The answer is yes and no. Yes, callbacks are only executed when the main thread is idle. No, the "processing" is not done when thread is idle. To be specific: there is no "processing" - it takes zero CPU time for node to "process" thousands of callbacks if what you mean by "process" is waiting.

How asynchronous I/O works (in any programming language)

Hardware

If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..

It all began with the invention of interrupts..

It was a great invention, but also a Box of Pandora - Edsger Dijkstra

Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.

Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".

If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.

OS

Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS. Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).

So, handling network connections is not even the job of the OS - the OS just keeps track of connections in it's data structures (or rather, the networking stack). What really handles network I/O is your network card, your router, your modem, your ISP etc. So waiting for I/O takes zero CPU resources. It just takes up some RAM to remember which program owns which socket.

Processes

Now that we have a clear picture of this we can understand what it is that node does. Various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select(). Node internally uses libuv as a high-level abstraction over these APIs.

How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.

What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows node to spend exactly 0% CPU time to process (wait for) I/O. This is what makes node fast - it's not really faster but does not waste time waiting.

Callback processing

With the understanding we now have of how node handles network I/O we can understand how callbacks affect performance.

  1. There is zero CPU penalty having thousands of callbacks waiting

    Of course, node still needs to maintain data structures in RAM to keep track of all the callbacks so callbacks do have memory penalty.

  2. Processing the return value from callbacks is done in a single thread

    This has some advantages and some drawbacks. It means node does not have to worry about race conditions and thus node does not internally use any semaphores or mutexes to guard data access. The disadvantage is that any CPU intensive javascript will block all other operations.

You mention that:

I see the performance of my 20 callbacks deteriorate from the first callback to the last one

The callbacks are all executed sequentially and synchronously in the main thread (only the waiting is actually done in parallel). Thus it could be that your callback is doing some CPU intensive calculations and the total execution time of all callbacks is actually 4 seconds.

However, I rarely see this kind of issue for that number of callbacks. It's still possible, I still don't know what you're doing in your callbacks. I just think it's unlikely.

You also mention:

until the callback of the function returns a value or an error

One likely explanation is that your network resource cannot handle that many simultaneous connections. You may not think it's much since it's only 20 connections but I've seen plenty of services that would crash at 10 requests/second. The problem is all 20 requests are simultaneous.

You can test this by taking node out of the picture and use a command line tool to send 20 simultaneous requests. Something like curl or wget:

# assuming you're running bash:
for x in `seq 1 20`;do curl -o /dev/null -w "Connect: %{time_connect} Start: %{time_starttransfer} Total: %{time_total} \n" http://example.com & done

Mitigation

If it turns out that the issue is doing the 20 requests simultaneously is stressing the other service what you can do is limit the number of simultaneous requests.

You can do this by batching your requests:

async function () {
    let input = [/* some values we need to process */];
    let result = [];

    while (input.length) {
        let batch = input.splice(0,3); // make 3 requests in parallel

        let batchResult = await Promise.all(batch.map(x => {
            return fetchNetworkResource(x);
        }));

        result = result.concat(batchResult);
    }
    return result;
}