Kubernetes can't mount NFS volumes after NFS server update and reboot
After zypper patch
'ing NFS server on openSUSE Leap 15.2 to latest version and rebooting, nodes in kubernetes cluster (Openshift 4.5) can no longer mount NFS volumes.
NFS server version: nfs-kernel-server-2.1.1-lp152.9.12.1.x86_64
/etc/exports contains:
/nfs 192.168.11.*(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)
Affected pods are in ContainerCreating status
kubectl describe pod/<pod_name>
gives a following error:
Warning FailedMount 31m kubelet MountVolume.SetUp failed for volume "volume" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/smart-services-http-video-stream --scope -- mount -t nfs nfs.example.invalid:/nfs/volume /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/pv-name
Output: Running scope as unit: run-r83d4e7dba1b645aca1e4693a48f45191.scope
mount.nfs: Operation not permitted
Server is running NFSv4 only, so rpcbind is turned off and showmount commands are not working.
Mounting directly on kubernetes node results in following error:
sudo mount.nfs4 nfs.example.invalid:/core tmp/ -v; echo $?
mount.nfs4: timeout set for Wed Jul 21 12:16:49 2021
mount.nfs4: trying text-based options 'vers=4.2,addr=192.168.11.2,clientaddr=192.168.11.3'
mount.nfs4: mount(2): Operation not permitted
mount.nfs4: Operation not permitted
32
firewalld rules on NFS server:
services: ssh dhcpv6-client nfs mountd rpc-bind samba http tftp
ports: 2049/tcp 2049/udp
AppArmor was working, turning it off haven't changed the outcome.
Before updating NFS server, everything was working fine and no other configuration changes were made. How can i debug this further and make shares mountable again?
After trying to debug this issue with rpcdebug
to no avail, i've resorted to dumping traffic on nfs server coming from one of the nodes. This dump gave an interesting lead:
NFS reply xid 4168498669 reply ERR 20: Auth Bogus Credentials (seal broken)
So the issue was certainly not related to network or apparmor.
Then i've tried to change exports
to
/nfs *(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)
and everything worked, confirming that this issue lies in some sort of exports
misconfiguration.
Rewriting rule to
/nfs 192.168.11.0/24(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)
restored connectivity.
According to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/deployment_guide/s1-nfs-server-config-exports
wildcards — Where a * or ? character is used to take into account a grouping of fully qualified domain names that match a particular string of letters. Wildcards should not be used with IP addresses; however, it is possible for them to work accidentally if reverse DNS lookups fail.
So using * with IP address was a clear misconfiguration that somehow worked for months, and finally resulted in errors described in question.