Make node.js not exit on error
I am working on a websocket oriented node.js server using Socket.IO. I noticed a bug where certain browsers aren't following the correct connect procedure to the server, and the code isn't written to gracefully handle it, and in short, it calls a method to an object that was never set up, thus killing the server due to an error.
My concern isn't with the bug in particular, but the fact that when such errors occur, the entire server goes down. Is there anything I can do on a global level in node to make it so if an error occurs it will simply log a message, perhaps kill the event, but the server process will keep on running?
I don't want other users' connections to go down due to one clever user exploiting an uncaught error in a large included codebase.
Solution 1:
You can attach a listener to the uncaughtException
event of the process object.
Code taken from the actual Node.js API reference (it's the second item under "process"):
process.on('uncaughtException', function (err) {
console.log('Caught exception: ', err);
});
setTimeout(function () {
console.log('This will still run.');
}, 500);
// Intentionally cause an exception, but don't catch it.
nonexistentFunc();
console.log('This will not run.');
All you've got to do now is to log it or do something with it, in case you know under what circumstances the bug occurs, you should file a bug over at Socket.IO's GitHub page:
https://github.com/LearnBoost/Socket.IO-node/issues
Solution 2:
Using uncaughtException is a very bad idea.
The best alternative is to use domains in Node.js 0.8. If you're on an earlier version of Node.js rather use forever to restart your processes or even better use node cluster to spawn multiple worker processes and restart a worker on the event of an uncaughtException.
From: http://nodejs.org/api/process.html#process_event_uncaughtexception
Warning: Using 'uncaughtException' correctly
Note that 'uncaughtException' is a crude mechanism for exception handling intended to be used only as a last resort. The event should not be used as an equivalent to On Error Resume Next. Unhandled exceptions inherently mean that an application is in an undefined state. Attempting to resume application code without properly recovering from the exception can cause additional unforeseen and unpredictable issues.
Exceptions thrown from within the event handler will not be caught. Instead the process will exit with a non-zero exit code and the stack trace will be printed. This is to avoid infinite recursion.
Attempting to resume normally after an uncaught exception can be similar to pulling out of the power cord when upgrading a computer -- nine out of ten times nothing happens - but the 10th time, the system becomes corrupted.
The correct use of 'uncaughtException' is to perform synchronous cleanup of allocated resources (e.g. file descriptors, handles, etc) before shutting down the process. It is not safe to resume normal operation after 'uncaughtException'.
To restart a crashed application in a more reliable way, whether uncaughtException is emitted or not, an external monitor should be employed in a separate process to detect application failures and recover or restart as needed.
Solution 3:
I just did a bunch of research on this (see here, here, here, and here) and the answer to your question is that Node will not allow you to write one error handler that will catch every error scenario that could possibly occur in your system.
Some frameworks like express will allow you to catch certain types of errors (when an async method returns an error object), but there are other conditions that you cannot catch with a global error handler. This is a limitation (in my opinion) of Node and possibly inherent to async programming in general.
For example, say you have the following express handler:
app.get("/test", function(req, res, next) {
require("fs").readFile("/some/file", function(err, data) {
if(err)
next(err);
else
res.send("yay");
});
});
Let's say that the file "some/file" does not actually exist. In this case fs.readFile will return an error as the first argument to the callback method. If you check for that and do next(err) when it happens, the default express error handler will take over and do whatever you make it do (e.g. return a 500 to the user). That's a graceful way to handle an error. Of course, if you forget to call next(err)
, it doesn't work.
So that's the error condition that a global handler can deal with, however consider another case:
app.get("/test", function(req, res, next) {
require("fs").readFile("/some/file", function(err, data) {
if(err)
next(err);
else {
nullObject.someMethod(); //throws a null reference exception
res.send("yay");
}
});
});
In this case, there is a bug if your code that results in you calling a method on a null object. Here an exception will be thrown, it will not be caught by the global error handler, and your node app will terminate. All clients currently executing requests on that service will get suddenly disconnected with no explanation as to why. Ungraceful.
There is currently no global error handler functionality in Node to handle this case. You cannot put a giant try/catch
around all your express handlers because by the time your asyn callback executes, those try/catch
blocks are no longer in scope. That's just the nature of async code, it breaks the try/catch error handling paradigm.
AFAIK, your only recourse here is to put try/catch
blocks around the synchronous parts of your code inside each one of your async callbacks, something like this:
app.get("/test", function(req, res, next) {
require("fs").readFile("/some/file", function(err, data) {
if(err) {
next(err);
}
else {
try {
nullObject.someMethod(); //throws a null reference exception
res.send("yay");
}
catch(e) {
res.send(500);
}
}
});
});
That's going to make for some nasty code, especially once you start getting into nested async calls.
Some people think that what Node does in these cases (that is, die) is the proper thing to do because your system is in an inconsistent state and you have no other option. I disagree with that reasoning but I won't get into a philosophical debate about it. The point is that with Node, your options are lots of little try/catch
blocks or hope that your test coverage is good enough so that this doesn't happen. You can put something like upstart or supervisor in place to restart your app when it goes down but that's simply mitigation of the problem, not a solution.
Node.js has a currently unstable feature called domains that appears to address this issue, though I don't know much about it.
Solution 4:
I've just put together a class which listens for unhandled exceptions, and when it see's one it:
- prints the stack trace to the console
- logs it in it's own logfile
- emails you the stack trace
- restarts the server (or kills it, up to you)
It will require a little tweaking for your application as I haven't made it generic as yet, but it's only a few lines and it might be what you're looking for!
Check it out!
Note: this is over 4 years old at this point, unfinished, and there may now be a better way - I don't know!)
process.on
(
'uncaughtException',
function (err)
{
var stack = err.stack;
var timeout = 1;
// print note to logger
logger.log("SERVER CRASHED!");
// logger.printLastLogs();
logger.log(err, stack);
// save log to timestamped logfile
// var filename = "crash_" + _2.formatDate(new Date()) + ".log";
// logger.log("LOGGING ERROR TO "+filename);
// var fs = require('fs');
// fs.writeFile('logs/'+filename, log);
// email log to developer
if(helper.Config.get('email_on_error') == 'true')
{
logger.log("EMAILING ERROR");
require('./Mailer'); // this is a simple wrapper around nodemailer http://documentup.com/andris9/nodemailer/
helper.Mailer.sendMail("GAMEHUB NODE SERVER CRASHED", stack);
timeout = 10;
}
// Send signal to clients
// logger.log("EMITTING SERVER DOWN CODE");
// helper.IO.emit(SIGNALS.SERVER.DOWN, "The server has crashed unexpectedly. Restarting in 10s..");
// If we exit straight away, the write log and send email operations wont have time to run
setTimeout
(
function()
{
logger.log("KILLING PROCESS");
process.exit();
},
// timeout * 1000
timeout * 100000 // extra time. pm2 auto-restarts on crash...
);
}
);