Is this a Linux NFS client bufferbloat?
OK, here's my answer.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=688232 with kernel 2.6.18 and 2.6.32 as shipped with RedHat (I haven't time to re-validate this with vanilla newer kernels), on a NFS client (v3 / tcp / default mount options), when one is writing to a file, the kernel also needs to update the timestamps of this file. While the file is being written, if another process wants the metadata of this file (such as when doing a stat
on this file or ls -l
in its parent directory), this reader process gets delayed by the kernel until the write is finished.
At the NFS level, I can see that the kernel will issue the GETATTR
call only after all (I am not sure on this, but in my tests up to 5GiB, the stat
time seemed to match the dd
time) the WRITE
. The bigger the write is, the longer is the wait.
With a slow NFS server or a server with a lot of RAM, that delay can be minutes. When the stat(2)
gets put to sleep, one can monitor /proc/meminfo
for NFS_Unstable
or Writeback
which shows how much data is flight.
I am not sure why the kernel does this, but at least now I understand the behaviour. So there's no bufferbloat, but some operations are serialized.