How to monitor NFS load from userland?
Apologies if I'm not using the proper jargon (although I'm a longtime linux user, I'm not an admin) or if this is a FAQ (though searching SE got lots of hits, I didn't see anything quite like this question):
I'm a user on a science cluster (with jobs managed by PBS/Torque, on RHEL5, FWIW). I'm about to start my first really-big job, so I asked the admin some configuration questions, to avoid stupid mistakes. I was mostly right, but he added the advice to "make sure you are not hammering the disk server with too much I/O," with followup to "use top [to] see if the nfs is going nuts."
How to do that? This is a cluster, so a lot is going on "behind the scenes" that is transparent to me. Plus I have next-to-no privileges. I also am limited to CLI via SSH, but that's the least of my problems. On the plus side, I do seem to be able to shell into any of the compute nodes, including those with attached disk(s).
So I'm wondering, how best to monitor NFS from userland? I know a little bit about top
and NFS, so I know I can do
top -p$(pgrep nfsd -d ',')
to get the list of NFS processes (no?). But what I'd really like to know--again, as a user (I have neither sudo
nor root) on RHEL5 (yes, we're still running that)--are
- One, or a few, aggregate statistics for NFS load across all NFS processes. Is this something I can get from
top
or another tool, without scraping output and doing my own math? And should I be monitoring processes other thannfsd
? - Advice concerning quantification of "NFS going nuts." If I can get one/few aggregate statistics, I can presumably get a pre-my-job baseline, but that still doesn't tell me "how high is too high."
Note: top
appears not to be the tool to use for this task, but at least it is available to me. The list of tools which are not available include
- nfsstat
- iostat
- iotop
Looking at top output is completely wrong. It's about the IOPS. To get a view on the NFS statistics, use nfsstat
:
Server rpc stats:
calls badcalls badauth badclnt xdrcall
40833255 0 0 0 0
Server nfs v3:
null getattr setattr lookup access readlink
0 0% 1411374 3% 107 0% 43169 0% 747514 1% 790 0%
read write create mkdir symlink mknod
38138706 93% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 491559 1%
fsstat fsinfo pathconf commit
6 0% 12 0% 6 0% 0 0%
If you have a monitoring program (fer instance, Zabbix) you can add a UserParameter to watch them:
# NFS stats
UserParameter=nfs.v3.server[*],nfsstat -s -l | awk 'BEGIN {FS=": *"}/v3 server.*$1:/ {print $$2}'
and make pretty graphs:
How high is too high? It totally depends on your workload:
You need to watch the filesystem and disk latency to see if you're overloading the disks.