RockyLinux 9.5 nfs-server unstable?

Hi

Since the upgrade to RockyLinux 9.5 our linux servers (VM or physical) frequently hard reboot or the nfs-server is just stopping. I already reinstalled one server, this server was stable over the weekend with zero/light NFS4.2 traffic. Today when more is going on it crashed already three times. Between one crash there was just 20min… With RockyLinux 9.4 everything was rock(y) stable with the same config that i use now.

Any idea what that could be?

[251118.198708] perf: interrupt took too long (2524 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[256533.825978] perf: interrupt took too long (3166 > 3155), lowering kernel.perf_event_max_sample_rate to 63000
[279965.977293] perf: interrupt took too long (3965 > 3957), lowering kernel.perf_event_max_sample_rate to 50000
[326621.176722] ------------[ cut here ]------------
[326621.176728] WARNING: CPU: 18 PID: 3270 at mm/slab_common.c:957 free_large_kmalloc+0x5a/0x80
[326621.176739] Modules linked in: tls binfmt_misc dm_service_time iscsi_tcp libiscsi_tcp libiscsi rpcrdma rdma_cm iw_cm ib_cm ib_core scsi_transport_iscsi nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif dm_mod kvm dell_wmi_descriptor sparse_keymap rfkill video iTCO_wdt rapl intel_cstate mxm_wmi mei_me dcdbas mei intel_uncore iTCO_vendor_support ipmi_si joydev acpi_power_meter ipmi_devintf ipmi_msghandler pcspkr lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg mgag200 uas usb_storage drm_kms_helper ahci libahci drm_shmem_helper crct10dif_pclmul crc32_pclmul drm ixgbe crc32c_intel libata
[326621.176795]  igb ghash_clmulni_intel megaraid_sas i2c_algo_bit mdio dca wmi fuse
[326621.176801] CPU: 18 PID: 3270 Comm: nfsd Kdump: loaded Not tainted 5.14.0-503.14.1.el9_5.x86_64 #1
[326621.176804] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.19.0 12/12/2023
[326621.176806] RIP: 0010:free_large_kmalloc+0x5a/0x80
[326621.176811] Code: da 9c 5b fa be 06 00 00 00 48 89 ef e8 af 25 0a 00 80 e7 02 74 01 fb 48 83 c4 08 44 89 e6 48 89 ef 5b 5d 41 5c e9 d6 28 04 00 <0f> 0b 45 31 e4 80 3d 43 0e fc 01 00 ba 00 f0 ff ff 0f 84 fb 9a 90
[326621.176813] RSP: 0018:ffffb6f9092bb968 EFLAGS: 00010246
[326621.176815] RAX: 0017ffffc0001000 RBX: ffffffff8c31e2e0 RCX: ffff94a6c40f9220
[326621.176816] RDX: fffff42b8cb69608 RSI: ffffffff8b058378 RDI: fffff42b8cb69600
[326621.176818] RBP: fffff42b8cb69600 R08: ffffffff8ca06440 R09: ffff94a9afc744b0
[326621.176819] R10: 00000000000003c8 R11: 0000000000000000 R12: ffffffff8b058378
[326621.176820] R13: 0000000000000000 R14: ffff94a68942ae00 R15: ffff94a9e567c000
[326621.176822] FS:  0000000000000000(0000) GS:ffff94a9afc40000(0000) knlGS:0000000000000000
[326621.176824] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[326621.176825] CR2: 00005626630c5140 CR3: 000000032f410001 CR4: 00000000003706f0
[326621.176827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[326621.176828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[326621.176829] Call Trace:
[326621.176831]  <TASK>
[326621.176833]  ? show_trace_log_lvl+0x1c4/0x2df
[326621.176839]  ? show_trace_log_lvl+0x1c4/0x2df
[326621.176842]  ? security_release_secctx+0x28/0x40
[326621.176846]  ? free_large_kmalloc+0x5a/0x80
[326621.176849]  ? __warn+0x7e/0xd0
[326621.176852]  ? free_large_kmalloc+0x5a/0x80
[326621.176855]  ? report_bug+0x100/0x140
[326621.176859]  ? handle_bug+0x3c/0x70
[326621.176862]  ? exc_invalid_op+0x14/0x70
[326621.176864]  ? asm_exc_invalid_op+0x16/0x20
[326621.176868]  ? lookup_dcache+0x18/0x60
[326621.176872]  ? lookup_dcache+0x18/0x60
[326621.176875]  ? free_large_kmalloc+0x5a/0x80
[326621.176878]  ? lookup_dcache+0x18/0x60
[326621.176880]  security_release_secctx+0x28/0x40
[326621.176883]  nfsd4_encode_fattr4+0x2cc/0x4f0 [nfsd]
[326621.176955]  ? avc_has_perm_noaudit+0x94/0x110
[326621.176959]  ? selinux_inode_permission+0x10e/0x1d0
[326621.176964]  ? __d_lookup+0x73/0xb0
[326621.176967]  ? d_lookup+0x29/0x50
[326621.176969]  ? lookup_dcache+0x18/0x60
[326621.176972]  nfsd4_encode_entry4_fattr+0xcd/0x1e0 [nfsd]
[326621.177019]  nfsd4_encode_entry4+0x100/0x290 [nfsd]
[326621.177072]  nfsd_buffered_readdir+0x144/0x250 [nfsd]
[326621.177114]  ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
[326621.177170]  ? __pfx_nfsd_buffered_filldir+0x10/0x10 [nfsd]
[326621.177211]  ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
[326621.177255]  nfsd_readdir+0xa9/0xe0 [nfsd]
[326621.177296]  nfsd4_encode_readdir+0xf8/0x1d0 [nfsd]
[326621.177341]  nfsd4_encode_operation+0xa6/0x2b0 [nfsd]
[326621.177386]  nfsd4_proc_compound+0x1d0/0x700 [nfsd]
[326621.177446]  nfsd_dispatch+0xe9/0x220 [nfsd]
[326621.177487]  svc_process_common+0x2e7/0x650 [sunrpc]
[326621.177583]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[326621.177623]  svc_process+0x12d/0x170 [sunrpc]
[326621.177691]  svc_handle_xprt+0x448/0x580 [sunrpc]
[326621.177750]  svc_recv+0x17a/0x2c0 [sunrpc]
[326621.177819]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[326621.177858]  nfsd+0x84/0xb0 [nfsd]
[326621.177896]  kthread+0xe0/0x100
[326621.177900]  ? __pfx_kthread+0x10/0x10
[326621.177904]  ret_from_fork+0x2c/0x50
[326621.177919]  </TASK>
[326621.177920] ---[ end trace 0000000000000000 ]---
[326621.177922] object pointer: 0x00000000e53caba2
[326621.179321] BUG: unable to handle page fault for address: ffff94a86da58000
[326621.179324] #PF: supervisor write access in kernel mode
[326621.179327] #PF: error_code(0x0003) - permissions violation
[326621.179330] PGD 330801067 P4D 330801067 PUD 100207063 PMD 800000032da000a1
[326621.179337] Oops: 0003 [#1] PREEMPT SMP PTI
[326621.179341] CPU: 18 PID: 3270 Comm: nfsd Kdump: loaded Tainted: G        W         -------  ---  5.14.0-503.14.1.el9_5.x86_64 #1
[326621.179345] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.19.0 12/12/2023
[326621.179347] RIP: 0010:svc_process_common+0xe7/0x650 [sunrpc]
[326621.179466] Code: 00 00 48 c7 87 80 02 00 00 00 00 00 00 48 29 d0 48 c1 f8 03 c1 e0 0c 89 87 cc 02 00 00 4c 89 e7 e8 ce a9 00 00 48 85 c0 74 02 <89> 18 be 04 00 00 00 4c 89 e7 e8 ba a9 00 00 48 85 c0 74 06 c7 00
[326621.179468] RSP: 0018:ffffb6f9092bbe28 EFLAGS: 00010286
[326621.179470] RAX: ffff94a86da58000 RBX: 000000000bc19c07 RCX: ffff94a86da58000
[326621.179471] RDX: ffff94a9e567c2e8 RSI: 0000000000000004 RDI: ffff94a9e567c238
[326621.179472] RBP: ffff94a9e567c000 R08: ffff94a9e567c1a0 R09: 0000000000000000
[326621.179473] R10: 0000000000000006 R11: 0000000000001000 R12: ffff94a9e567c238
[326621.179474] R13: ffff94a9c23f8f00 R14: ffff94a9c23f8784 R15: ffff94a9e567c000
[326621.179475] FS:  0000000000000000(0000) GS:ffff94a9afc40000(0000) knlGS:0000000000000000
[326621.179477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[326621.179478] CR2: ffff94a86da58000 CR3: 000000032f410001 CR4: 00000000003706f0
[326621.179479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[326621.179480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[326621.179481] Call Trace:
[326621.179483]  <TASK>
[326621.179484]  ? show_trace_log_lvl+0x1c4/0x2df
[326621.179488]  ? show_trace_log_lvl+0x1c4/0x2df
[326621.179492]  ? svc_process+0x12d/0x170 [sunrpc]
[326621.179547]  ? __die_body.cold+0x8/0xd
[326621.179551]  ? page_fault_oops+0x134/0x170
[326621.179554]  ? kernelmode_fixup_or_oops+0x84/0x110
[326621.179557]  ? exc_page_fault+0xa8/0x150
[326621.179561]  ? asm_exc_page_fault+0x22/0x30
[326621.179565]  ? svc_process_common+0xe7/0x650 [sunrpc]
[326621.179621]  ? svc_process_common+0xe2/0x650 [sunrpc]
[326621.179678]  svc_process+0x12d/0x170 [sunrpc]
[326621.179736]  svc_handle_xprt+0x448/0x580 [sunrpc]
[326621.179796]  svc_recv+0x17a/0x2c0 [sunrpc]
[326621.179856]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[326621.179896]  nfsd+0x84/0xb0 [nfsd]
[326621.179936]  kthread+0xe0/0x100
[326621.179940]  ? __pfx_kthread+0x10/0x10
[326621.179943]  ret_from_fork+0x2c/0x50
[326621.179947]  </TASK>
[326621.179948] Modules linked in: tls binfmt_misc dm_service_time iscsi_tcp libiscsi_tcp libiscsi rpcrdma rdma_cm iw_cm ib_cm ib_core scsi_transport_iscsi nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif dm_mod kvm dell_wmi_descriptor sparse_keymap rfkill video iTCO_wdt rapl intel_cstate mxm_wmi mei_me dcdbas mei intel_uncore iTCO_vendor_support ipmi_si joydev acpi_power_meter ipmi_devintf ipmi_msghandler pcspkr lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg mgag200 uas usb_storage drm_kms_helper ahci libahci drm_shmem_helper crct10dif_pclmul crc32_pclmul drm ixgbe crc32c_intel libata
[326621.179989]  igb ghash_clmulni_intel megaraid_sas i2c_algo_bit mdio dca wmi fuse
[326621.179994] CR2: ffff94a86da58000

I can also see now kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 160s! [nfsd:5657]

I’ve upgraded a server with nfs-server on this morning for the 9.5 release and seeing the same behaviour:

Server is non-responsive, won’t ping.

Looks to have crashed a few minutes after initially rebooting and applying the updates, and again maybe two hours after forcibly rebooting from that. (So twice in a few hours)

Now back on the previous kernel to see if this helps.

Error:

Dec  3 08:54:24  kernel: [ 3922.574999] WARNING: CPU: 2 PID: 1104 at mm/slab_common.c:957 free
_large_kmalloc+0x5a/0x80
Dec  3 08:54:24  kernel: [ 3922.575035] Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 d
ns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core rfkill nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_count
er nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vsock_loopback vmw_vsock_virtio_transport_common nf_tables vmw_vsock_vm
ci_transport vsock nfnetlink vmwgfx intel_rapl_msr vmw_balloon drm_ttm_helper intel_rapl_common ttm drm_kms_helper pcspkr vm
w_vmci i2c_piix4 joydev nfsd nfs_acl lockd auth_rpcgss grace drm sunrpc xfs libcrc32c sd_mod crct10dif_pclmul crc32_pclmul t
10_pi crc32c_intel ata_generic sg ghash_clmulni_intel ata_piix vmw_pvscsi libata vmxnet3 serio_raw dm_mirror dm_region_hash 
dm_log dm_mod fuse
Dec  3 08:54:24  kernel: [ 3922.575080] CPU: 2 PID: 1104 Comm: nfsd Not tainted 5.14.0-503.15.
1.el9_5.x86_64 #1
Dec  3 08:54:24  kernel: [ 3922.575082] Hardware name: VMware, Inc. VMware Virtual Platform/44
0BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Dec  3 08:54:24  kernel: [ 3922.575084] RIP: 0010:free_large_kmalloc+0x5a/0x80
Dec  3 08:54:24  kernel: [ 3922.575087] Code: da 9c 5b fa be 06 00 00 00 48 89 ef e8 af 25 0a  
00 80 e7 02 74 01 fb 48 83 c4 08 44 89 e6 48 89 ef 5b 5d 41 5c e9 d6 28 04 00 <0f> 0b 45 31 e4 80 3d d3 0d fc 01 00 ba 00 f0
 ff ff 0f 84 8b 9a 90
Dec  3 08:54:24  kernel: [ 3922.575088] RSP: 0018:ffffc0fb0243fb28 EFLAGS: 00010246
Dec  3 08:54:24  kernel: [ 3922.575090] RAX: 0017ffffd0000014 RBX: ffffffffb431e2e0 RCX: ffff9
f24e50a7db8
Dec  3 08:54:24  kernel: [ 3922.575091] RDX: fffff15a859f4d48 RSI: ffffffffc0b3647c RDI: fffff
15a859f4d80
Dec  3 08:54:24  kernel: [ 3922.575092] RBP: fffff15a859f4d80 R08: ffffffffb4a07910 R09: ffff9
f25b5eb44b0
Dec  3 08:54:24  kernel: [ 3922.575093] R10: 000000000000000c R11: ffff9f25b5eb1c64 R12: fffff
fffc0b3647c
Dec  3 08:54:24  kernel: [ 3922.575094] R13: 0000000000000000 R14: ffff9f25607fa000 R15: ffff9
f2492370000
Dec  3 08:54:24  kernel: [ 3922.575095] FS:  0000000000000000(0000) GS:ffff9f25b5e80000(0000) 
knlGS:0000000000000000
Dec  3 08:54:24  kernel: [ 3922.575096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  3 08:54:24  kernel: [ 3922.575097] CR2: 000000c0005fc000 CR3: 00000001139cc005 CR4: 00000
000007706f0
Dec  3 08:54:24  kernel: [ 3922.575111] PKRU: 55555554
Dec  3 08:54:24  kernel: [ 3922.575112] Call Trace:
Dec  3 08:54:24  kernel: [ 3922.575114]  <TASK>
Dec  3 08:54:24  kernel: [ 3922.575115]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575122]  ? show_trace_log_lvl+0x26e/0x2df
Dec  3 08:54:24  kernel: [ 3922.575130]  ? show_trace_log_lvl+0x26e/0x2df
Dec  3 08:54:24  kernel: [ 3922.575134]  ? security_release_secctx+0x28/0x40
Dec  3 08:54:24  kernel: [ 3922.575137]  ? free_large_kmalloc+0x5a/0x80
Dec  3 08:54:24  kernel: [ 3922.575140]  ? __warn+0x7e/0xd0
Dec  3 08:54:24  kernel: [ 3922.575142]  ? free_large_kmalloc+0x5a/0x80
Dec  3 08:54:24  kernel: [ 3922.575145]  ? report_bug+0x100/0x140
Dec  3 08:54:24  kernel: [ 3922.575148]  ? handle_bug+0x3c/0x70
Dec  3 08:54:24  kernel: [ 3922.575155]  ? exc_invalid_op+0x14/0x70
Dec  3 08:54:24  kernel: [ 3922.575157]  ? asm_exc_invalid_op+0x16/0x20
Dec  3 08:54:24  kernel: [ 3922.575159]  ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575213]  ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575232]  ? free_large_kmalloc+0x5a/0x80
Dec  3 08:54:24  kernel: [ 3922.575235]  ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575251]  security_release_secctx+0x28/0x40
Dec  3 08:54:24  kernel: [ 3922.575253]  nfsd4_encode_fattr4+0x2cc/0x4f0 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575276]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575278]  ? __kmem_cache_alloc_node+0x18f/0x2e0
Dec  3 08:54:24  kernel: [ 3922.575281]  ? security_prepare_creds+0x71/0xa0
Dec  3 08:54:24  kernel: [ 3922.575285]  ? security_prepare_creds+0x71/0xa0
Dec  3 08:54:24  kernel: [ 3922.575286]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575288]  ? __kmalloc+0x4b/0x140
Dec  3 08:54:24  kernel: [ 3922.575289]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575291]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575292]  ? security_prepare_creds+0x47/0xa0
Dec  3 08:54:24  kernel: [ 3922.575294]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575296]  ? prepare_creds+0x180/0x270
Dec  3 08:54:24  kernel: [ 3922.575298]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575300]  ? nfsd_setuser+0x110/0x270 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575320]  ? srso_alias_return_thunk+0x5/0xfbef5
Dec  3 08:54:24  kernel: [ 3922.575322]  ? nfsd_setuser_and_check_port+0x4a/0xc0 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575340]  ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575364]  nfsd4_encode_getattr+0x2b/0x40 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575387]  nfsd4_encode_operation+0xa6/0x2b0 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575407]  nfsd4_proc_compound+0x1d0/0x700 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575426]  nfsd_dispatch+0xe9/0x220 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575447]  svc_process_common+0x2e7/0x650 [sunrpc]
Dec  3 08:54:24  kernel: [ 3922.575511]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575535]  svc_process+0x12d/0x170 [sunrpc]
Dec  3 08:54:24  kernel: [ 3922.575560]  svc_handle_xprt+0x448/0x580 [sunrpc]
Dec  3 08:54:24  kernel: [ 3922.575589]  svc_recv+0x17a/0x2c0 [sunrpc]
Dec  3 08:54:24  kernel: [ 3922.575613]  ? __pfx_nfsd+0x10/0x10 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575635]  nfsd+0x84/0xb0 [nfsd]
Dec  3 08:54:24  kernel: [ 3922.575660]  kthread+0xe0/0x100
Dec  3 08:54:24  kernel: [ 3922.575663]  ? __pfx_kthread+0x10/0x10
Dec  3 08:54:24  kernel: [ 3922.575666]  ret_from_fork+0x2c/0x50
Dec  3 08:54:24  kernel: [ 3922.575671]  </TASK>
Dec  3 08:54:24  kernel: [ 3922.575672] ---[ end trace 0000000000000000 ]---

Maybe related to:

https://bugs.rockylinux.org/view.php?id=8451

Yeah that was me to get more visibility :smiley:

It feels like very much the same thing. At least dropping back to the previous kernel seems to have temporarily fixed the issue.

So does this affect everyone who is trying to run nfsd on RHEL 9.5?

I would say yes. But it depends on the workload. I have 6 servers. 2 are rebooting more or less every day. Others are running without any problem.

We’re seeing similar issues on a physical server shortly after upgrading from 9.4 to 9.5, and they correlate with heavy NFS usage:

Dec  9 11:09:02 cal-s03 kernel: ------------[ cut here ]------------
Dec  9 11:09:02 cal-s03 kernel: WARNING: CPU: 6 PID: 2682 at mm/slab_common.c:957 free_large_kmalloc+0x5a/0x80
Dec  9 11:09:02 cal-s03 kernel: Modules linked in: nf_conntrack_netlink 8021q garp mrp stp llc tls rpcrdma rdma_cm iw_cm ib_cm ib_core nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink dell_rbu vfat intel_rapl_msr fat intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 rapl i2c_algo_bit intel_cstate drm_shmem_helper iTCO_wdt dcdbas acpi_ipmi iTCO_vendor_support isst_if_mmio isst_if_mbox_pci ses drm_kms_helper intel_uncore ipmi_si dell_smbios pcspkr dell_wmi_descriptor wmi_bmof isst_if_common enclosure mei_me ipmi_devintf i2c_i801 mei intel_pch_thermal intel_vsec i2c_smbus ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc xfs libcrc32c sd_mod t10_pi sg ahci libahci crct10dif_pclmul mpt3sas bnxt_en libata crc32_pclmul
Dec  9 11:09:02 cal-s03 kernel: crc32c_intel tg3 megaraid_sas raid_class scsi_transport_sas ghash_clmulni_intel wmi dm_mirror dm_region_hash dm_log dm_mod
Dec  9 11:09:02 cal-s03 kernel: CPU: 6 PID: 2682 Comm: nfsd Kdump: loaded Not tainted 5.14.0-503.15.1.el9_5.x86_64 #1
Dec  9 11:09:02 cal-s03 kernel: Hardware name: Dell Inc. PowerEdge R450/073H50, BIOS 1.15.2 09/10/2024
Dec  9 11:09:02 cal-s03 kernel: RIP: 0010:free_large_kmalloc+0x5a/0x80
Dec  9 11:09:02 cal-s03 kernel: Code: da 9c 5b fa be 06 00 00 00 48 89 ef e8 af 25 0a 00 80 e7 02 74 01 fb 48 83 c4 08 44 89 e6 48 89 ef 5b 5d 41 5c e9 d6 28 04 00 <0f> 0b 45 31 e4 80 3d d3 0d fc 01 00 ba 00 f0 ff ff 0f 84 8b 9a 90
Dec  9 11:09:02 cal-s03 kernel: RSP: 0018:ff3479d38193f968 EFLAGS: 00010246
Dec  9 11:09:02 cal-s03 kernel: RAX: 0017ffffc0001000 RBX: ffffffff8711e2e0 RCX: ff18a472537d7140
Dec  9 11:09:02 cal-s03 kernel: RDX: fff6bf4ac0611608 RSI: ffffffff85e583e8 RDI: fff6bf4ac0611600
Dec  9 11:09:02 cal-s03 kernel: RBP: fff6bf4ac0611600 R08: ffffffff87806248 R09: ff18a4917f5b44b0
Dec  9 11:09:02 cal-s03 kernel: R10: 00000000000002ef R11: 0000000000991359 R12: ffffffff85e583e8
Dec  9 11:09:02 cal-s03 kernel: R13: 0000000000000000 R14: ff18a47212f7d300 R15: ff18a4724b20c000
Dec  9 11:09:02 cal-s03 kernel: FS:  0000000000000000(0000) GS:ff18a4917f580000(0000) knlGS:0000000000000000
Dec  9 11:09:02 cal-s03 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 11:09:02 cal-s03 kernel: CR2: 00007f4bdbb27584 CR3: 0000002019e10004 CR4: 0000000000771ef0
Dec  9 11:09:02 cal-s03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  9 11:09:02 cal-s03 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Dec  9 11:09:02 cal-s03 kernel: PKRU: 55555554
Dec  9 11:09:02 cal-s03 kernel: Call Trace:
Dec  9 11:09:02 cal-s03 kernel: <TASK>
Dec  9 11:09:02 cal-s03 kernel: ? show_trace_log_lvl+0x1c4/0x2df
Dec  9 11:09:02 cal-s03 kernel: ? show_trace_log_lvl+0x1c4/0x2df
Dec  9 11:09:02 cal-s03 kernel: ? security_release_secctx+0x25/0x40
Dec  9 11:09:02 cal-s03 kernel: ? free_large_kmalloc+0x5a/0x80
Dec  9 11:09:02 cal-s03 kernel: ? __warn+0x7e/0xd0
Dec  9 11:09:02 cal-s03 kernel: ? free_large_kmalloc+0x5a/0x80
Dec  9 11:09:02 cal-s03 kernel: ? report_bug+0x100/0x140
Dec  9 11:09:02 cal-s03 kernel: ? handle_bug+0x3c/0x70
Dec  9 11:09:02 cal-s03 kernel: ? exc_invalid_op+0x14/0x70
Dec  9 11:09:02 cal-s03 kernel: ? asm_exc_invalid_op+0x16/0x20
Dec  9 11:09:02 cal-s03 kernel: ? lookup_dcache+0x18/0x60
Dec  9 11:09:02 cal-s03 kernel: ? lookup_dcache+0x18/0x60
Dec  9 11:09:02 cal-s03 kernel: ? free_large_kmalloc+0x5a/0x80
Dec  9 11:09:02 cal-s03 kernel: ? lookup_dcache+0x18/0x60
Dec  9 11:09:02 cal-s03 kernel: security_release_secctx+0x25/0x40
Dec  9 11:09:02 cal-s03 kernel: nfsd4_encode_fattr4+0x2cc/0x4f0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: ? avc_has_perm_noaudit+0x94/0x110
Dec  9 11:09:02 cal-s03 kernel: ? selinux_inode_permission+0x10e/0x1d0
Dec  9 11:09:02 cal-s03 kernel: ? __d_lookup+0x73/0xb0
Dec  9 11:09:02 cal-s03 kernel: ? d_lookup+0x29/0x50
Dec  9 11:09:02 cal-s03 kernel: ? lookup_dcache+0x18/0x60
Dec  9 11:09:02 cal-s03 kernel: nfsd4_encode_entry4_fattr+0xcd/0x1e0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd4_encode_entry4+0x100/0x290 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd_buffered_readdir+0x141/0x250 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_nfsd_buffered_filldir+0x10/0x10 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd_readdir+0xa9/0xe0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd4_encode_readdir+0xf8/0x1d0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd4_encode_operation+0xa3/0x2b0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd4_proc_compound+0x1d0/0x700 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd_dispatch+0xe6/0x220 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: svc_process_common+0x2e4/0x650 [sunrpc]
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: svc_process+0x12d/0x170 [sunrpc]
Dec  9 11:09:02 cal-s03 kernel: svc_handle_xprt+0x448/0x580 [sunrpc]
Dec  9 11:09:02 cal-s03 kernel: svc_recv+0x17a/0x2c0 [sunrpc]
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: nfsd+0x84/0xb0 [nfsd]
Dec  9 11:09:02 cal-s03 kernel: kthread+0xdd/0x100
Dec  9 11:09:02 cal-s03 kernel: ? __pfx_kthread+0x10/0x10
Dec  9 11:09:02 cal-s03 kernel: ret_from_fork+0x29/0x50
Dec  9 11:09:02 cal-s03 kernel: </TASK>
Dec  9 11:09:02 cal-s03 kernel: ---[ end trace 0000000000000000 ]---
Dec  9 11:09:02 cal-s03 kernel: object pointer: 0x00000000aa54dbdb

As mentioned by others, a workaround is to go back to the kernel from el9.4.

If you are concerned about running an old kernel (vulnerability etc), another temporary solution will be to use elrepo’s kernel-ml.

https://elrepo.org/wiki/doku.php?id=kernel-ml

I notice there’s a 503.16.1.el9_5 kernel available now, anyone got any idea whether this fixes the issue?

From what I heard, the fix will be in kernel-5.14.0-503.18.1.el9_5.

Red Hat has just released kernel-5.14.0-503.19.1.el9_5 which has the fix. Rocky will follow soon.

Try to run nfs protocol 4.0 or 4.1 only and check if you still have the same errors. We had strange behavior of nfs server, running on Rocky 9.4 with clients hanging when they try to mount with 4.2 protocol.(Clients also runs Rocky 9.4 )
Things seems better when we configure clients to mount shares with 4.1 or 4.0

Is there a clear explanation anywhere of the exact problem, and what they say is the solution?