How do I debug error ECONNRESET in Node.js?

I'm running an Express.js application using for a chat webapp and I get the following error randomly around 5 times during 24h. The node process is wrapped in forever and it restarts itself immediately.

The problem is that restarting Express kicks my users out of their rooms and nobody wants that.

The web server is proxied by HAProxy. There are no socket stability issues, just using websockets and flashsockets transports. I cannot reproduce this on purpose.

This is the error with Node v0.10.11:

            throw er; // Unhandled 'error' event
    Error: read ECONNRESET     //alternatively it s a 'write'
        at errnoException (net.js:900:11)
        at TCP.onread (net.js:555:19)
    error: Forever detected script exited with code: 8
    error: Forever restarting script for 2 time

EDIT (2013-07-22)

Added both client error handler and the uncaught exception handler. Seems that this one catches the error:

    process.on('uncaughtException', function (err) {
      console.log("Node NOT Exiting...");

So I suspect it's not a issue but an HTTP request to another server that I do or a MySQL/Redis connection. The problem is that the error stack doesn't help me identify my code issue. Here is the log output:

    Error: read ECONNRESET
        at errnoException (net.js:900:11)
        at TCP.onread (net.js:555:19)

How do I know what causes this? How do I get more out of the error?

Ok, not very verbose but here's the stacktrace with Longjohn:

    Exception caught: Error ECONNRESET
    { [Error: read ECONNRESET]
      code: 'ECONNRESET',
      errno: 'ECONNRESET',
      syscall: 'read',
       [ { receiver: [Object],
           fun: [Function: errnoException],
           pos: 22930 },
         { receiver: [Object], fun: [Function: onread], pos: 14545 },
         { receiver: [Object],
           fun: [Function: fireErrorCallbacks],
           pos: 11672 },
         { receiver: [Object], fun: [Function], pos: 12329 },
         { receiver: [Object], fun: [Function: onread], pos: 14536 } ],
       { [Error]
         id: 1061835,
         location: 'fireErrorCallbacks (net.js:439)',
         __location__: 'process.nextTick',
         __previous__: null,
         __trace_count__: 1,
         __cached_trace__: [ [Object], [Object], [Object] ] } }

Here I serve the flash socket policy file:

    net = require("net")
    net.createServer( (socket) =>
      socket.write("<?xml version=\"1.0\"?>\n")
      socket.write("<!DOCTYPE cross-domain-policy SYSTEM \"\">\n")
      socket.write("<allow-access-from domain=\"*\" to-ports=\"*\"/>\n")

Can this be the cause?

Solution 1:

You might have guessed it already: it's a connection error.

"ECONNRESET" means the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors. You could look at the API server logs to see if it complains about something.

But since you are also looking for a way to check the error and potentially debug the problem, you should take a look at "How to debug a socket hang up error in NodeJS?" which was posted at stackoverflow in relation to an alike question.

Quick and dirty solution for development:

Use longjohn, you get long stack traces that will contain the async operations.

Clean and correct solution: Technically, in node, whenever you emit an 'error' event and no one listens to it, it will throw. To make it not throw, put a listener on it and handle it yourself. That way you can log the error with more information.

To have one listener for a group of calls you can use domains and also catch other errors on runtime. Make sure each async operation related to http(Server/Client) is in different domain context comparing to the other parts of the code, the domain will automatically listen to the error events and will propagate it to it's own handler. So you only listen to that handler and get the error data. You also get more information for free.

EDIT (2013-07-22)

As I wrote above:

"ECONNRESET" means the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors. You could look at the API server logs to see if it complains about something.

What could also be the case: at random times, the other side is overloaded and simply kills the connection as a result. If that's the case, depends on what you're connecting to exactly…

But one thing's for sure: you indeed have a read error on your TCP connection which causes the exception. You can see that by looking at the error code you posted in your edit, which confirms it.

Solution 2:

A simple tcp server I had for serving the flash policy file was causing this. I can now catch the error using a handler:

# serving the flash policy file
net = require("net")

net.createServer((socket) =>
  //just added
  socket.on("error", (err) =>
    console.log("Caught flash policy server socket error: ")

  socket.write("<?xml version=\"1.0\"?>\n")
  socket.write("<!DOCTYPE cross-domain-policy SYSTEM \"\">\n")
  socket.write("<allow-access-from domain=\"*\" to-ports=\"*\"/>\n")