Automatically restart a Unix job if it goes down?

Solution 1:

If your program runs in the foreground, use Gerrit Pape's runit. Advantages:

  • Its pretty well bullet proof (based on Dan Berstein's daemontools).
  • It runs on a wide variety of platforms (portable).
  • It is packaged on Ubuntu and Debian (along w/ above..).
  • It is relatively easy to configure (run script, log script, some symlinks).

Solution 2:

I use Monit for this purpose, it's free and open source. It does what you need and so much more.

What Monit can do

Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. You can use Monit to monitor files, directories and filesystems for changes, such as timestamp changes, checksum changes or size changes. You can also monitor remote hosts; Monit can ping a remote host and can check TCP/IP port connections and server protocols. Monit is controlled via an easy to use control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert

I also like their design philosophy:

It is important for a system monitoring tool to just work - all the time and you should be able to trust it to do so. A system monitoring tool need to be non-intrusive and you should be able to forget about it once it's installed. That is, until sshd dies on your co-located server, 50 miles away. When this happens, it is good to know that you have installed this extra layer of security and protection - just wait a few seconds and Monit will restart the sshd daemon. It is also helpful to get an alert mail before the server disks are full or if your http server suddenly is slashdotted.

Monit is designed as an autonomous system and does not depend on plugins nor any special libraries to run. Instead it works right out of the box and can utilize existing infrastructure already on your system. For instance, Monit will easily integrate with init and can use existing runlevel rc-scripts to manage services. There are also flexibility for those special cases when you need a certain setup for a service.

Monit compiles and run on most flavors of UNIX. It is a small program and weights in at just over 300kB. There is support for compiling with glibc replacements such as uClibc if you need it to be even smaller.


Since you do not have root access, a script like this may work for your requirement of:

"If the job is not currently running, then start the job"

if [ $(ps ax | grep -v grep | grep "/usr/local/apache2/bin/httpd" | wc -l) -eq 0 ]
then
        echo "httpd Service not running"
        apachectl start
fi

the above is coded I created and tested with cron and the Apache httpd daemon. It simply searches for your string in the current list of processes. If 0 lines are found it isn't running so it will restart it. Make sure to include grep -v grep to eliminate your search from the process output. Try using the entire path to the binary to ensure it is the service being found in your queries. If you only use httpd for example, then having httpd.conf open in vim will make the program think the httpd service is running when it really isn't. Of course, your method of starting the service will also be different.

Solution 3:

This approach is fast and cheap and not bulletproof:

#!/usr/bin/perl -w
$l = `ps x`;
if (not $l =~ /mzscheme/) {
        system('~/utils/src/plt/bin/mzscheme &');
}

I put that script in a cron file.

Solution 4:

There are also solutions especially designed to work as a watchdog and even run as services scripts which don't create pid files etc. An example of such a solution is supervisor.