Kernel error for NFS

On some of our nfs clients (a computing server) we got many messages like this (about several times a minute, but not in a fixed interval):

[41467.979249] NFS: state manager: check lease failed on NFSv4 server 10.112.170.123 with error 13
[41470.028683] NFS: state manager: check lease failed on NFSv4 server 10.112.84.146 with error 13
[41471.054432] NFS: state manager: check lease failed on NFSv4 server 10.112.235.48 with error 13
[41560.283462] __nfs4_reclaim_open_state: 56 callbacks suppressed
[41560.283477] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.283496] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.283496] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.299965] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.299975] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.299975] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.333832] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.333841] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.333843] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[41560.334908] NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
[198551.966363] nfs4_schedule_state_manager: kthread_run: -4

I have no idea what could be the reason for hat messages. They occur suddenly yesterday evening nearly at the same time on the nfs clients. dnf-automatic is running in the night, so nfs problem didn’t occur immediate after package updates.

I have configured the nfsd on the file server to use 256 threads. I thought this should be enough. May too few threads cause the listed error messages?

What is the exact Rocky version, and what exact updates went on “last night”, before the errors started.

Rocky linux 8.10
Before 41467.979249

After I run mountstats /home/project/, I got following messages:

 NFS mount options: rw,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.201,local_lock=none
  NFS server capabilities: caps=0xc003c037,wtmult=512,dtsize=131072,bsize=0,namlen=255
  NFSv4 capability flags: bm0=0xfdffafff,bm1=0xf9be3e,bm2=0x800,acl=0x0,sessions,pnfs=notconfigured,lease_time=90,lease_expired=0
  NFS security flavor: 1  pseudoflavor: 0

NFS byte counts:
  applications read 91138160914 bytes via read(2)
  applications wrote 8488243992 bytes via write(2)
  applications read 0 bytes via O_DIRECT read(2)
  applications wrote 0 bytes via O_DIRECT write(2)
  client read 25485879385 bytes via NFS READ
  client wrote 8703133269 bytes via NFS WRITE
RPC statistics:
  405933759 RPC requests sent, 405933535 RPC replies received (9 XIDs not found)
  average backlog queue length: 0

TEST_STATEID:
        79902424 ops (19%)
        avg bytes sent per op: 154      avg bytes received per op: 96
        backlog wait: 0.002438  RTT: 0.067541   total execute time: 0.079290 (milliseconds)
GETATTR:
        2753587 ops (0%)        9 retrans (0%)  3 major timeouts        9 errors (0%)
        avg bytes sent per op: 202      avg bytes received per op: 239
        backlog wait: 1.472777  RTT: 0.203039   total execute time: 1.781585 (milliseconds)
READ:
        471135 ops (0%)
        avg bytes sent per op: 217      avg bytes received per op: 54198
        backlog wait: 0.011351  RTT: 0.384162   total execute time: 0.409063 (milliseconds)
LOCK:
        398056 ops (0%)
        avg bytes sent per op: 256      avg bytes received per op: 112
        backlog wait: 0.005894  RTT: 0.073892   total execute time: 0.083169 (milliseconds)
LOCKU:
        382289 ops (0%)
        avg bytes sent per op: 231      avg bytes received per op: 112
        backlog wait: 0.005697  RTT: 0.068511   total execute time: 0.077025 (milliseconds)
CLOSE:
        329587 ops (0%)         336 errors (0%)
        avg bytes sent per op: 215      avg bytes received per op: 132
        backlog wait: 0.010355  RTT: 0.086681   total execute time: 0.101400 (milliseconds)
STATFS:
        291114 ops (0%)         3 retrans (0%)  2 major timeouts        3 errors (0%)
        avg bytes sent per op: 201      avg bytes received per op: 159
        backlog wait: 2.771375  RTT: 0.420619   total execute time: 3.361285 (milliseconds)
OPEN_NOATTR:
        282704 ops (0%)         251 errors (0%)
        avg bytes sent per op: 270      avg bytes received per op: 345
        backlog wait: 0.015415  RTT: 0.120179   total execute time: 0.147515 (milliseconds)
DELEGRETURN:
        231112 ops (0%)         2 errors (0%)
        avg bytes sent per op: 221      avg bytes received per op: 164
        backlog wait: 2.566751  RTT: 0.228348   total execute time: 2.854495 (milliseconds)
FREE_STATEID:
        189152 ops (0%)         26 errors (0%)
        avg bytes sent per op: 151      avg bytes received per op: 88
        backlog wait: 0.005726  RTT: 0.069082   total execute time: 0.077673 (milliseconds)
WRITE:
        165595 ops (0%)
        avg bytes sent per op: 52796    avg bytes received per op: 179
        backlog wait: 67.448908         RTT: 1.263474   total execute time: 68.724521 (milliseconds)
LOOKUP:
        134782 ops (0%)         60776 errors (45%)
        avg bytes sent per op: 229      avg bytes received per op: 212
        backlog wait: 0.003903  RTT: 0.106550   total execute time: 0.118376 (milliseconds)
ACCESS:
        106224 ops (0%)
        avg bytes sent per op: 208      avg bytes received per op: 164
        backlog wait: 0.002787  RTT: 0.077911   total execute time: 0.087701 (milliseconds)
SETATTR:
        93580 ops (0%)
        avg bytes sent per op: 249      avg bytes received per op: 264
        backlog wait: 0.003986  RTT: 0.388609   total execute time: 0.400748 (milliseconds)
READDIR:
        79308 ops (0%)
        avg bytes sent per op: 225      avg bytes received per op: 1656
        backlog wait: 0.002446  RTT: 0.360884   total execute time: 0.370127 (milliseconds)
OPEN:
        76777 ops (0%)  23654 errors (30%)
        avg bytes sent per op: 320      avg bytes received per op: 292
        backlog wait: 0.009873  RTT: 0.188924   total execute time: 0.203798 (milliseconds)
REMOVE:
        42665 ops (0%)  66 errors (0%)
        avg bytes sent per op: 219      avg bytes received per op: 115
        backlog wait: 0.004149  RTT: 0.180898   total execute time: 0.191703 (milliseconds)
COMMIT:
        21540 ops (0%)
        avg bytes sent per op: 203      avg bytes received per op: 104
        backlog wait: 0.161142  RTT: 0.193593   total execute time: 0.364206 (milliseconds)
RENAME:
        15875 ops (0%)  313 errors (1%)
        avg bytes sent per op: 305      avg bytes received per op: 151
        backlog wait: 156.135181        RTT: 0.226772   total execute time: 156.365669 (milliseconds)
CREATE:
        10213 ops (0%)  265 errors (2%)
        avg bytes sent per op: 251      avg bytes received per op: 333
        backlog wait: 0.003525  RTT: 0.188975   total execute time: 0.201410 (milliseconds)
LINK:
        6901 ops (0%)   16 errors (0%)
        avg bytes sent per op: 307      avg bytes received per op: 291
        backlog wait: 0.002463  RTT: 0.173888   total execute time: 0.184031 (milliseconds)
OPEN_DOWNGRADE:
        3382 ops (0%)   4 errors (0%)
        avg bytes sent per op: 219      avg bytes received per op: 111
        backlog wait: 0.016263  RTT: 0.089888   total execute time: 0.110881 (milliseconds)
SYMLINK:
        497 ops (0%)    2 errors (0%)
        avg bytes sent per op: 285      avg bytes received per op: 339
        backlog wait: 0.002012  RTT: 0.195171   total execute time: 0.205231 (milliseconds)
READLINK:
        91 ops (0%)
        avg bytes sent per op: 190      avg bytes received per op: 136
        backlog wait: 0.000000  RTT: 0.384615   total execute time: 0.395604 (milliseconds)
BIND_CONN_TO_SESSION:
        28 ops (0%)
        avg bytes sent per op: 116      avg bytes received per op: 68
        backlog wait: 0.000000  RTT: 0.107143   total execute time: 0.142857 (milliseconds)
EXCHANGE_ID:
        3 ops (0%)
        avg bytes sent per op: 256      avg bytes received per op: 100
        backlog wait: 0.000000  RTT: 8.000000   total execute time: 8.000000 (milliseconds)
CREATE_SESSION:
        3 ops (0%)      1 errors (33%)
        avg bytes sent per op: 208      avg bytes received per op: 97
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
SERVER_CAPS:
        2 ops (0%)
        avg bytes sent per op: 180      avg bytes received per op: 160
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
SEQUENCE:
        2 ops (0%)
        avg bytes sent per op: 128      avg bytes received per op: 80
        backlog wait: 118339.500000     RTT: 0.000000   total execute time: 118339.500000 (milliseconds)
RECLAIM_COMPLETE:
        2 ops (0%)
        avg bytes sent per op: 132      avg bytes received per op: 88
        backlog wait: 0.000000  RTT: 77.000000  total execute time: 77.000000 (milliseconds)
NULL:
        1 ops (0%)
        avg bytes sent per op: 44       avg bytes received per op: 24
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
FSINFO:
        1 ops (0%)
        avg bytes sent per op: 176      avg bytes received per op: 152
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
PATHCONF:
        1 ops (0%)
        avg bytes sent per op: 172      avg bytes received per op: 116
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
DESTROY_SESSION:
        1 ops (0%)      1 errors (100%)
        avg bytes sent per op: 108      avg bytes received per op: 44
        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000 (milliseconds)
GET_LEASE_TIME:
        1 ops (0%)
        avg bytes sent per op: 140      avg bytes received per op: 112
        backlog wait: 0.000000  RTT: 3.000000   total execute time: 3.000000 (milliseconds)

The things look like strange is

SEQUENCE:
        2 ops (0%)
        avg bytes sent per op: 128      avg bytes received per op: 80
        backlog wait: 118339.500000     RTT: 0.000000   total execute time: 118339.500000 (milliseconds)

Does anyone know how to debug this problem?
The network seems normal to me

team0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.105  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::2da0:198c:66cc:c944  prefixlen 64  scopeid 0x20<link>
        ether 34:73:79:2a:8b:57  txqueuelen 1000  (Ethernet)
        RX packets 482720667  bytes 210181730388 (195.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 479067645  bytes 560939843606 (522.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.