How to monitor NFS load from userland?

Apologies if I'm not using the proper jargon (although I'm a longtime linux user, I'm not an admin) or if this is a FAQ (though searching SE got lots of hits, I didn't see anything quite like this question):

I'm a user on a science cluster (with jobs managed by PBS/Torque, on RHEL5, FWIW). I'm about to start my first really-big job, so I asked the admin some configuration questions, to avoid stupid mistakes. I was mostly right, but he added the advice to "make sure you are not hammering the disk server with too much I/O," with followup to "use top [to] see if the nfs is going nuts."

How to do that? This is a cluster, so a lot is going on "behind the scenes" that is transparent to me. Plus I have next-to-no privileges. I also am limited to CLI via SSH, but that's the least of my problems. On the plus side, I do seem to be able to shell into any of the compute nodes, including those with attached disk(s).

So I'm wondering, how best to monitor NFS from userland? I know a little bit about top and NFS, so I know I can do

top -p$(pgrep nfsd -d ',')

to get the list of NFS processes (no?). But what I'd really like to know--again, as a user (I have neither sudo nor root) on RHEL5 (yes, we're still running that)--are

One, or a few, aggregate statistics for NFS load across all NFS processes. Is this something I can get from top or another tool, without scraping output and doing my own math? And should I be monitoring processes other than nfsd?
Advice concerning quantification of "NFS going nuts." If I can get one/few aggregate statistics, I can presumably get a pre-my-job baseline, but that still doesn't tell me "how high is too high."

Note: top appears not to be the tool to use for this task, but at least it is available to me. The list of tools which are not available include

nfsstat
iostat
iotop

Looking at top output is completely wrong. It's about the IOPS. To get a view on the NFS statistics, use nfsstat:

Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
40833255   0          0          0          0       

Server nfs v3:
null         getattr      setattr      lookup       access       readlink     
0         0% 1411374   3% 107       0% 43169     0% 747514    1% 790       0% 
read         write        create       mkdir        symlink      mknod        
38138706 93% 0         0% 0         0% 0         0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
0         0% 0         0% 0         0% 0         0% 0         0% 491559    1% 
fsstat       fsinfo       pathconf     commit       
6         0% 12        0% 6         0% 0         0%

If you have a monitoring program (fer instance, Zabbix) you can add a UserParameter to watch them:

# NFS stats
UserParameter=nfs.v3.server[*],nfsstat -s -l | awk 'BEGIN {FS=": *"}/v3 server.*$1:/ {print $$2}'

and make pretty graphs: enter image description here

How high is too high? It totally depends on your workload:

nfs graph

You need to watch the filesystem and disk latency to see if you're overloading the disks.

10GbE with FCoE with SAN and LAN traffic is a good solution?

What is the `sshd_config` parameter to set the time between password prompts?

Make a user from one domain a member of Domain Admins of another domain

/var/log/httpd/access_log no longer being written

ping -a sometimes returns host name and sometimes returns FQDN. Why? Does this mean DNS is corrupt somehow?

How to debug sporadic outbound connection timeouts?

Cannot remote desktop into a non-domain joined workgroup Windows Hyper-V Server 2012R2 [closed]

Nginx custom internal error page when upstream is down

Automation in Windows Instances on Amazon EC2

Is a gradual migration from a workgroup to a domain possible?

What will happen if a PoE powered patch cable is plugged into a normal RJ-45 plug?

IIS 7.5 inconsistently Gzips files (with PHP & ASP.NET)

How to monitor NFS load from userland?

Related

Recent Posts