What specifically makes Node.js more scalable than Apache?

To be honest I've not understood it completely yet - and I even do understand how Node.js works, as a single thread using the event model. I just don't get how this is better than Apache, and how it scales horizontally if it's single-threaded.


Solution 1:

I've found that this blog post by Tomislav Capan explains it very well:
Why The Hell Would I Use Node.js? A Case-by-Case Introduction

My interpretation of the gist of it, for Node 0.10, compared to Apache:

The good parts

  • Node.js avoids spinning up threads for each request, or does not need to handle pooling of requests to a set of threads like Apache does. Therefore it has less overhead to handle requests, and excels at responding quickly.
  • Node.js can delegate execution of the request to a separate component, and focus on new requests until the delegated component returns with the processed result. This is asynchronous code, and is made possible by the eventing model. Apache executes requests in serial within a pool, and cannot reuse the thread when one of its modules is simply waiting for a task to complete. Apache will then queue requests until a thread in the pool becomes available again.
  • Node.js talks JavaScript and is therefore very fast in passing through and manipulating JSON retrieved from external web API sources like MongoDB, reducing time needed per request. Apache modules, like PHP, may need more time, because they cannot efficiently parse and manipulate JSON because they need marshalling to process the data.

The bad parts

Note: most of the bad parts listed below will be improved with the upcoming version 0.12, something to keep aware of.

  • Node.js sucks at computational intensive tasks, because whenever it does something long running, it will queue all other incoming requests, due to its single thread. Apache will generally have more threads available, and the OS will neatly and fairly schedule CPU time between these threads, still allowing new threads to be handled, albeit a bit slower. Except when all available threads in Apache are handling requests, then Apache will also start queueing requests.
  • Node.js doesn't fully utilize multi-core CPUs, unless you make a Node.js cluster or spin up child processes. Ironically, if you do the latter two, you may add more orchestrating overhead, the same issue that Apache has. Logically you could also spin up more Node.js processes, but this is not managed by Node.js. You would have to test your code to see what works better; 1) multi-threading from within Node.js with clusters and child processes, or 2) multiple Node.js processes.

Mitigations

All server platforms have an upper limit. Node.js and Apache both will reach it at some point.

  • Node.js will reach it the fastest when you have heavy computational tasks.
  • Apache will reach it the fastest when you throw tons of small requests at it that require long serial execution.

Three things you could do to scale the throughput of Node.js

  1. Utilize multi-core CPUs, by either setting up a cluster, use child processes, or use a multi-process orchestrator like Phusion Passenger.
  2. Setup worker roles connected with a message queue. This will be the most effective solution against computational intensive long running requests; off-load them to a worker farm. This will split up your servers in two parts; 1) public facing clerical servers that accept requests from users, and 2) private worker servers handling long running tasks. Both are connected with a message queue. The clerical servers add messages (incoming long-running requests) to the queue. The worker roles listen for incoming messages, handle those, and may return the result into the message queue. If request/response is needed, then the clerical server could asynchronously wait for the response message to arrive in the message queue. Examples of message queues are RabbitMQ and ZeroMQ.
  3. Setup a load balancer and spin up more servers. Now that you efficiently use hardware and delegate long running tasks, you can scale horizontally. If you have a load balancer, you can add more clerical servers. Using a message queue, you can add more worker servers. You could even set this up in the cloud so that you could scale on demand.

Solution 2:

It depends on how you use it. Node.js is single threaded by default, but using the (relatively) new cluster module you can scale horizontally across multiple threads.

Furthermore, your database needs will also dictate how effective scaling is with node. For example, using MySQL with node.js won't get you nearly as much benefit as using MongoDB, because of the event driven nature of both MongoDB and node.js.

The following link has a lot of nice benchmarks of systems with different setups: http://www.techempower.com/benchmarks/

Node.js doesn't rank the highest but compared to other setups using nginx (no apache on their tables, but close enough) it does pretty well.

Again though, it highly depends on your needs. I believe if you are simply serving static websites it is recommend you stick with a more traditional stack. However people have done some amazing things with node.js for other needs: http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/ (c10k? ha!)

Edit: It is worth mentioning that you really aren't 'replacing' just apache with node.js. You would be replacing apache AND php (in a typical lamp stack).