We do not have any sort of proper server monitoring solution in place (a situation I'd like to rectify this summer), but I've got one service in particular that I'd like to monitor. [Our current monitoring system, waiting for clients to call in with a problem, works well for widely-used systems, but this does not affect as many people, as, say, DHCP.]

I'm running All The Right Type 3 Server on (sigh) an OS X 10.3 (Panther) box [because it doesn't start itself automatically on something newer and our software procurement person really doesn't want to get updated software.] The client software is working fine the student's machines, under OS X 10.5 (Leopard).

Now then, I occassionally get a call that this server is down [and the machine itself is still up!], and I'd really prefer to know before someone has to pick up the phone. The process appears to be called "atrtserv.osx". While it does use the network, I would be surprized if the documentation for it spoke of how to send a message to see if the service is up, so something gross-grained, like check every five minutes that the process exists.

Is there a simple way to monitor one service, esp. on the Mac?


Solution 1:

I don't guarantee that this code works right (most especially the line that starts with "RUNNING=", but substitute some test of your own that can return a string if it's running and no string when it isn't.

#!/bin/bash  

while true ; do 
RUNNING=`ps aux | grep atrtserv | grep -v grep`
if [ -n "$RUNNING" ] ; then
    echo "atrtserv.osx is broken" | mail -s "atrtserv down" [email protected]
    sleep 300
else 
    sleep 300
    fi
done

Edit Use the pgrep above instead of my hack

Solution 2:

Well, the real simple way probably looks like:

pgrep atrtserv.osx >/dev/null || (
    date |
    mail -s 'atrtserv.osx down on yourmachine.foo' [email protected]
)

in a cron job running at the interval of your choice.