monitoring nfs with monit
I'd like to monitor NFS mounts and the NFS server process using Monit.
On the server, I'd need a PID file, but I can't seem to find a way of getting that created with existing configuration files. Is there a way to do this, or has anyone monitored the server in a different way (checking if port 53 is active, etc).
On clients, I was thinking of making Monit simply look for a specific file in an NFS mount, and if it's accessible, all is well. Problem is, if the NFS server does go down, file requests usually hang (perhaps even indefinitely, not sure). How would one get around this issue with monit?
Any configuration examples would be greatly appreciated!
As for the "hanging" of the Monit process during NFS server faults, this can be circumvented by two methods.
- You change the NFS mount options from
hard
tosoft
, which causes the NFS layer to issue an I/O error to the accessing application afterretrans
retries. As this can introduce other problems with respect to data integrity (your writing applications need to be able to cope with I/O errors or at least exit cleanly, without corrupting the file written), you may also try to: - asynchronize your check (disentangle it) from Monit. You may define a cronjob regularly checking your NFS-mounted file and writing another "NFS state file" eg. to /tmp. That way, just the cronjob will hang (and not your Monit client) if the NFS server goes away. Your Monit check now just checks this second-stage "NFS status file" AND whether it is much older than the cronjob's frequency (which would indicate such hanging of NFS).
Hope this helps!
The general approach would be (assuming none of the Monit built-in rules are applicable)
- Find out how you would do the checks manually
- Write shell scripts performing these checks, returning 0 for 'success' and 1 for 'failure'
-
Let Monit test those scripts (example is from official documentation):
check program myscript with path "/usr/local/bin/myscript.sh" if status != 0 then alert
For the specific problem, this could mean
-
Server: It probably depends on your OS, linux distro, NFS 3 or 4 etc, but it should be easy to figure out. E.g. on Ubuntu 12.04, I would test whether NFS server is running via
$ service portmap status $ service nfs-kernel-server status
Create a shell script returning 0 if both commands return 'running'.
-
Client: To check whether a certain NFS share is currently mounted, I mostly use df -h. So the corresponding shell script would look like
#! /bin/bash df -h | grep -q thesharename
Did you check the init scripts for nfs already? I'd suspect that they are creating a pid file and sticking it somewhere for future restart or stop operations. If not, it should be pretty simple to modify them to do so.
As far as checking the mount goes, take a look at section 4.3.1 at http://nfs.sourceforge.net/nfs-howto/ar01s04.html#mounting_remote_dirs . If you mount it with the 'soft' option you will get behavior that lets you monitor it, but this should not be used for the actual mount. Perhaps you want a second mount just for monitoring?
I’m directly using the df
test without a specific script:
check program nfs-var with path "/bin/df -t nfs4 /var"
if status != 0 then alert
if status = 1 then exec "/bin/mount /var"