I think I have a DNS problem.
I just setup a new Rocky Linux 9.5 system with an autofs NFS mounted home directory. This machine is referred to below as nfsclient.white.lan
.
It sporadically hangs for 20 seconds trying to open a file or write or do NFS stuff.
It looks like it occurs when autofs re-mounts after the mount is “Deactivated successfully”.
I have another client that mounts the server just fine.
I think the problem has to do something with the fact that the DNS server is not authoritative for the NFS server or client.
The machines in question are:
nfsclient.white.lan: 10.10.10.201
nfsserver.white.lan: 10.10.10.17
dnsserver.black.lan: 10.10.10.72
dnsserver.white.lan: 10.10.10.1
So the DNS server being used is only authoritative for black.lan
and has to forward to 10.10.10.1
.
I don’t see why this should be a problem.
If I do dig nfsserver.white.lan
, it is instantly successful.
But .. I got a capture which shows DNS:
DNS: 10.10.10.201 > 10.10.10.72: A nfsserver.white.lan
DNS: 10.10.10.201 > 10.10.10.72: AAAA nfsserver.white.lan
DNS: 10.10.10.72 > 10.10.10.201: A nfsserver.white.lan success: 10.10.10.17
5 seconds later …
ARP: Who has 10.10.10.201 tell 10.10.10.72
ARP: 10.10.10.201 is at ..a9
DNS: 10.10.10.201 > 10.10.10.72: A nfsserver.white.lan
DNS: 10.10.10.201 > 10.10.10.72: AAAA nfsserver.white.lan
DNS: 10.10.10.72 > 10.10.10.201: A nfsserver.white.lan success: 10.10.10.17
ARP: Who has 10.10.10.72 tell 10.10.10.201
ARP: 10.10.10.72 is at ..6d
4 seconds later …
DNS: 10.10.10.72 > 10.10.10.201: "Server failure": AAAA nfsserver.white.lan
DNS: 10.10.10.201 > 10.10.10.72: A nfsserver.white.lan
DNS: 10.10.10.201 > 10.10.10.72: AAAA nfsserver.white.lan
DNS: 10.10.10.72 > 10.10.10.201: A nfsserver.white.lan success: 10.10.10.17
and two more retries …
after 20 seconds in total …
ICMP: 10.10.10.201 > 10.10.10.72: Destination unreachable
NFS: SYN / ACK ... success
So it looks like there are a few odd things going on here:
ICMP is blocked (presumably firewalled but I don’t see why this should be an issue)
AAAA lookups are failing (normal - no IPv6 here yet)
ARP and DNS retrans seems to indicate that DNS responses are not being received by 201?
Could this be an SELinux thing?
Note: I can’t just change DNS to dnserver.white.lan because dnsserver.black.lan is Windows DNS which has the SRV records and such needed for doing Kerberos with Windows KDCs.
UPDATE 1:
After disabling SELinux and turning off firewalld, the problem was not resolved.
However, after changing /etc/auto.nfs
from:
user1 -fstype=nfs,rw nfsserver.white.lan:/d0/user1
to reference the server by IP address:
user1 -fstype=nfs,rw 10.10.10.17:/d0/user1
Now it works. Not ideal. But techncially it’s working.
Something is blocking NFS from getting DNS responses. Presumably it’s some kind of obscure security feature of NFS or some RPC service or maybe autofs is somehow influencing things or …