What's the best way of setting up a cron job to check that a long-running process is still going and if not, start it?

As per the title:

What's the best way of setting up a cron job to check that a long-running process is still going and if not, start it?

If I start a long-running process in cron, is it going to block? or does cron fork the process as an independent child?

Thanks!


Solution 1:

What's the best way of setting up a cron job to check that a long-running process is still going and if not, start it?

A simple approach is to have a straightforward script that checks whether the process is running or not, and then restart it when necessary.

(Sometimes it's best to actually validate that the process is running by means of a 'dummy transaction', e.g. to verify an SMTP process you might make a TCP port connection and check that it responds correctly.)

But do watch out for differences in your environment between you as an interactive user, and when cron(8) runs your script.

To answer the second bit of your question:

If I start a long-running process in cron, is it going to block? or does cron fork the process as an independent child?

cron(8) will fork to execute a cron job, but unless your script or process 'detaches', cron will maintain it as a child process until it exits (that's how cron is able to collect all output from stderr, and send it via email.)

But, I think you were thinking, could you actually run the long-running process from cron? If you do that, you need to make sure that it can only start one-copy of itself, and that it will quickly exit if it is already running.

Better solutions for keeping long-running processes running - if you're only worried about an exit or crash

  • if your process can be made to stay attached, use init(1) via inittab(5) and the 'respawn' option. Often even daemons have "no fork" options.
  • Or, if your OS doesn't have an inittab feature or you don't have access to it, use something like DJB's daemontools.
  • if you had the luxury of using Solaris 10, or OpenSolaris, you can use SMF. (This can even work with processes that do fork and detach.)
  • If it's your own code, you can write it to have a parent/child pair of processes, where the parent restarts the child whenever it receives a SIGCHLD.

Solution 2:

A common idiom is for a long running process to have a pid file. Basically a file in /var/run or similar that just has the pid, or process id of the programme. When the programme starts it puts the file there, when it stops, it removes the file. You can easily check that the programme is there by seeing if that file is there.

This can also be used to see if the programme has crashed. If the file is there, but there is no process running with that pid, then the programme stopped, but didn't remove the pid file, i.e. it crashed. In this case, you can remove the pid file and restart the programme. However this isn't fool proof as sometimes the PID could get reused by a new process that started after your original one crashed.

Solution 3:

Depending on how do you detect your process, the cronjob might look like

* * * * * pidof executable || /usr/local/bin/executable

provided that your executable presents itself as itself in process list. More sensible way would be to have a pidfile and use start-stop-daemon. Actually, it all depends on the process in question. A while ago I've also written a small process maintenance daemon dudki exactly for this purpose.

And no, cron won't block, but depending on the nature of your process you may want to background it, anyway.