How to get Monit to re-monitor a service it has unmonitored?
While devising an answer to this question I ran into a snag while testing this MySQL Monit ruleset on an Ubuntu 12.04.5 setup:
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
group mysql
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host 127.0.0.1 port 3306
with timeout 15 seconds
then restart
if 5 restarts within 5 cycles
then timeout
alert [email protected] only on { timeout, nonexist }
The issue is I was attempting to invoke start/stop items via /etc/init.d/
— which is more of a CentOS/RedHat system construct — instead of using /usr/sbin/service
which would be more appropriate for a Ubuntu/Debian system.
Okay, my bad… But the issue is you see that if 5 restarts within 5 cycles then timeout
part? That seems to have bit me hard. With the /etc/init.d/mysql start
command not able to work, the system attempted 5 restarts, failed 5 times and then timed out as a result. And the timeout condition seems to result in the MySQL service ruleset being ignored my Monit.
I’ve restarted the Monit service a few times and even rejiggered the ruleset to see if it helps but none of that seems to affect anything.
What can I do to get Monit to pay attention to rulesets it has “unmonitored” due to timeout conditions being met?
Monit includes commands to enable and disable monitoring of all or specific services.
If a service has become unmonitored you can re-enable monitoring with e.g. monit monitor mysql
or monit monitor all
.
Note you must have the Monit HTTP interface enabled for these commands to work.
After doing some digging, it turns out Monit stores system monitoring data in a “state” file. And this “state” file keeps track of what services are being monitored/unmonitored.
So while this is a bit “brute force”-ish, it definitely works. If a service becomes “unmonitored” due to something like a timeout, then just remove the Monit state file from the system like this:
sudo rm /var/lib/monit/state
And then restart Monit like this and all should be good:
sudo service monit restart
FWIW, on other systems/setups the Monit “state” file might be saved as state
or monit.state
or even .monit.state
(with a dot/period .
prepending it) in another directory. Be sure to determine exactly where that “state” file is being saved when you actually attempt to implement this fix.