Last week we had an issue with our nfs server that is running on Rocky 9.5 (Kernel 5.14.0-503.21.1.el9_5.x86_64). The server was at stuck at this kernel panic error (screenshot attached) on the console.
INFO: task nfsd: 1961 blocked for more than 122 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message.
We were able to SSH into the server and found no OS related issues. CPU and Memory were all normal. But the NFS service was not available to several users. Eventually, we had to reboot the VM to fix this issue. How can we avoid this kernel related error in the future. Any help would be appreciated.
The “stable” in Enterprise Linux is quite a lot about updates not breaking things. That one can install a server and do regular dnf up to keep it secure for a decade. Granted, feature updates may require rebuild of binaries to match the updated libraries.
Two things about that:
VM tends to reboot very quickly (compared to baremetal POST). NFS clients ought to cope with such short pauses
A VM can be focused on one task (here NFS). Such server should not have many dependencies?