Tuning NFS client/server stack

Solution 1:

Just to clarify, you're getting 50MB/sec with NFS over a single Gb ethernet connection?

And the host server is running CentOS with VMware Server installed, which is in turn running the 7 VMs? Is there a particular reason you've gone with CentOS and VMware Server combined, rather than VMware ESXi which is a higher performance solution?

The 50MB/sec isn't great, but it's not much below what you'd expect over a single Gb network cable - once you've put in the NFS tweaks people have mentioned above you're going to be looking at maybe 70-80MB/sec. Options along the line of:

"ro,hard,intr,retrans=2,rsize=32768,wsize=32768,nfsvers=3,tcp"

are probably reasonable for you at both ends of the system.

To get above that you're going to need to look at teaming the network cards into pairs, which should increase your throughput by about 90%. You might need a switch that supports 802.3ad to get the best performance with link aggregation.

One thing I'd suggest though is your IO throughput on the OpenSolaris box sounds suspiciously high, 12 disks aren't likely to support 1.6GB/sec of throughput, and that may be heavily cached by Solaris + ZFS.

Solution 2:

For our RHEL/CentOS 5 machines we use the following mount flags

nfsvers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr,noatime

Newer Linux kernel version support even larger rsize/wsize parameters, but 32k is the maximum for the 2.6.18 kernel in EL5.

On the NFS server(s), at least for Linux no_wdelay supposedly helps if you have a disk controller with BBWC. Also, if you use the noatime flag on the clients, it probably makes sense to mount the filesystems on the servers with noatime as well.

And, as was already mentioned, don't bother with UDP. With higher speed networks (1GbE+) there is a small, but non-zero, chance of a sequence number wraparound causing data corruption. Also, if there is a possibility of packet loss, TCP will perform better than UDP.

If you're not worrying about data integrity that much, the "async" export option can be a major performance improvement (the problem with async is that you might lose data if the server crashes).

Also, at least for Linux server, you need to make sure you have enough NFS server threads running. The default 8 is just way too low.

Solution 3:

I once did a test with a dell r710, 1 cpu, 4 GB RAM, 6 SATA disk with RAID-10. The client was a sun x2100, both with CentOS 5.3 and the nfs params like mentioned above

"ro,hard,intr,retrans=2,rsize=32768,wsize=32768,nfsvers=3,tcp"

mounted on both sides with noatime.

I did also bump up to nfsds to 256 and used the noop scheduler for the perc6 raid controller. Another thing i did was to align the partitions to the 64K stripe size of the raid controller.

then i measured the nfs performance with dd - for reads i could fill the gigE pipe but for writes i could only get slightly better results as you. With async enabled i could get 70 to 80 MB/s but async was no option for my.

Maybe you can't get more with nfs from a gigE link?