monit: check process without pidfile

Solution 1:

In monit, you can use a matching string for processes that do not have a PID. Using the example of a process named "myprocessname",

check process myprocessname
        matching "myprocessname"
        start program = "/etc/init.d/myproccessname start"
        stop program = "/usr/bin/killall myprocessname"
        if cpu usage > 95% for 10 cycles then restart

Maybe if you check to see if CPU load is at a certain level for 10 monitoring cycles (of 30-seconds each), then restart or kill, that could be an option. Or you could use monit's timestamp testing on a file related to the process.

Solution 2:

There no ready-to-use tool with that functionality. Let assume you want to kill php-cgi scripts, that runs longer than minute. Do this:

pgrep php-cgi | xargs ps -o pid,time | perl -ne 'print "$1 " if /^\s*([0-9]+) ([0-9]+:[0-9]+:[0-9]+)/ && $2 gt "00:01:00"' | xargs kill

pgrep will select processes by name, ps -o pid,time prints runtime for every pid, and then analyse line, extract time from it, and print pid if time compares with defined one. result passed to kill.

Solution 3:

I solved this exact issue with ps-watcher and wrote about it on linux.com a few years back. ps-watcher does allow you to monitor processes and kill them based on accumulated run time. Here's the relevant ps-watcher configuration, assuming your process is named 'foo':

[foo]
  occurs = every
  trigger = elapsed2secs('$time') > 1*HOURS && $ppid != 1
  action = <<EOT
  echo "$command accumulated too much CPU time" | /bin/mail user\@host
  kill -TERM $pid
EOT

[foo?]
   occurs = none
   action = /usr/local/etc/foo restart

The key is the line

trigger = elapsed2secs('$time') > 1*HOURS && $ppid != 1`

which says 'if accumulated process time is > 1 hour AND I'm not the parent process, restart me.

So, I realize that answer doesn't use monit, but it does work. ps-watcher is lightweight and simple to set up, so there's no harm running it in addition to your monit setup.