Since the upgrade to RockyLinux 9.5 our linux servers (VM or physical) frequently hard reboot or the nfs-server is just stopping. I already reinstalled one server, this server was stable over the weekend with zero/light NFS4.2 traffic. Today when more is going on it crashed already three times. Between one crash there was just 20min… With RockyLinux 9.4 everything was rock(y) stable with the same config that i use now.
I’ve upgraded a server with nfs-server on this morning for the 9.5 release and seeing the same behaviour:
Server is non-responsive, won’t ping.
Looks to have crashed a few minutes after initially rebooting and applying the updates, and again maybe two hours after forcibly rebooting from that. (So twice in a few hours)
Now back on the previous kernel to see if this helps.
Error:
Dec 3 08:54:24 kernel: [ 3922.574999] WARNING: CPU: 2 PID: 1104 at mm/slab_common.c:957 free
_large_kmalloc+0x5a/0x80
Dec 3 08:54:24 kernel: [ 3922.575035] Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 d
ns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core rfkill nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_count
er nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vsock_loopback vmw_vsock_virtio_transport_common nf_tables vmw_vsock_vm
ci_transport vsock nfnetlink vmwgfx intel_rapl_msr vmw_balloon drm_ttm_helper intel_rapl_common ttm drm_kms_helper pcspkr vm
w_vmci i2c_piix4 joydev nfsd nfs_acl lockd auth_rpcgss grace drm sunrpc xfs libcrc32c sd_mod crct10dif_pclmul crc32_pclmul t
10_pi crc32c_intel ata_generic sg ghash_clmulni_intel ata_piix vmw_pvscsi libata vmxnet3 serio_raw dm_mirror dm_region_hash
dm_log dm_mod fuse
Dec 3 08:54:24 kernel: [ 3922.575080] CPU: 2 PID: 1104 Comm: nfsd Not tainted 5.14.0-503.15.
1.el9_5.x86_64 #1
Dec 3 08:54:24 kernel: [ 3922.575082] Hardware name: VMware, Inc. VMware Virtual Platform/44
0BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Dec 3 08:54:24 kernel: [ 3922.575084] RIP: 0010:free_large_kmalloc+0x5a/0x80
Dec 3 08:54:24 kernel: [ 3922.575087] Code: da 9c 5b fa be 06 00 00 00 48 89 ef e8 af 25 0a
00 80 e7 02 74 01 fb 48 83 c4 08 44 89 e6 48 89 ef 5b 5d 41 5c e9 d6 28 04 00 <0f> 0b 45 31 e4 80 3d d3 0d fc 01 00 ba 00 f0
ff ff 0f 84 8b 9a 90
Dec 3 08:54:24 kernel: [ 3922.575088] RSP: 0018:ffffc0fb0243fb28 EFLAGS: 00010246
Dec 3 08:54:24 kernel: [ 3922.575090] RAX: 0017ffffd0000014 RBX: ffffffffb431e2e0 RCX: ffff9
f24e50a7db8
Dec 3 08:54:24 kernel: [ 3922.575091] RDX: fffff15a859f4d48 RSI: ffffffffc0b3647c RDI: fffff
15a859f4d80
Dec 3 08:54:24 kernel: [ 3922.575092] RBP: fffff15a859f4d80 R08: ffffffffb4a07910 R09: ffff9
f25b5eb44b0
Dec 3 08:54:24 kernel: [ 3922.575093] R10: 000000000000000c R11: ffff9f25b5eb1c64 R12: fffff
fffc0b3647c
Dec 3 08:54:24 kernel: [ 3922.575094] R13: 0000000000000000 R14: ffff9f25607fa000 R15: ffff9
f2492370000
Dec 3 08:54:24 kernel: [ 3922.575095] FS: 0000000000000000(0000) GS:ffff9f25b5e80000(0000)
knlGS:0000000000000000
Dec 3 08:54:24 kernel: [ 3922.575096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 3 08:54:24 kernel: [ 3922.575097] CR2: 000000c0005fc000 CR3: 00000001139cc005 CR4: 00000
000007706f0
Dec 3 08:54:24 kernel: [ 3922.575111] PKRU: 55555554
Dec 3 08:54:24 kernel: [ 3922.575112] Call Trace:
Dec 3 08:54:24 kernel: [ 3922.575114] <TASK>
Dec 3 08:54:24 kernel: [ 3922.575115] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575122] ? show_trace_log_lvl+0x26e/0x2df
Dec 3 08:54:24 kernel: [ 3922.575130] ? show_trace_log_lvl+0x26e/0x2df
Dec 3 08:54:24 kernel: [ 3922.575134] ? security_release_secctx+0x28/0x40
Dec 3 08:54:24 kernel: [ 3922.575137] ? free_large_kmalloc+0x5a/0x80
Dec 3 08:54:24 kernel: [ 3922.575140] ? __warn+0x7e/0xd0
Dec 3 08:54:24 kernel: [ 3922.575142] ? free_large_kmalloc+0x5a/0x80
Dec 3 08:54:24 kernel: [ 3922.575145] ? report_bug+0x100/0x140
Dec 3 08:54:24 kernel: [ 3922.575148] ? handle_bug+0x3c/0x70
Dec 3 08:54:24 kernel: [ 3922.575155] ? exc_invalid_op+0x14/0x70
Dec 3 08:54:24 kernel: [ 3922.575157] ? asm_exc_invalid_op+0x16/0x20
Dec 3 08:54:24 kernel: [ 3922.575159] ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575213] ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575232] ? free_large_kmalloc+0x5a/0x80
Dec 3 08:54:24 kernel: [ 3922.575235] ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575251] security_release_secctx+0x28/0x40
Dec 3 08:54:24 kernel: [ 3922.575253] nfsd4_encode_fattr4+0x2cc/0x4f0 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575276] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575278] ? __kmem_cache_alloc_node+0x18f/0x2e0
Dec 3 08:54:24 kernel: [ 3922.575281] ? security_prepare_creds+0x71/0xa0
Dec 3 08:54:24 kernel: [ 3922.575285] ? security_prepare_creds+0x71/0xa0
Dec 3 08:54:24 kernel: [ 3922.575286] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575288] ? __kmalloc+0x4b/0x140
Dec 3 08:54:24 kernel: [ 3922.575289] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575291] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575292] ? security_prepare_creds+0x47/0xa0
Dec 3 08:54:24 kernel: [ 3922.575294] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575296] ? prepare_creds+0x180/0x270
Dec 3 08:54:24 kernel: [ 3922.575298] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575300] ? nfsd_setuser+0x110/0x270 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575320] ? srso_alias_return_thunk+0x5/0xfbef5
Dec 3 08:54:24 kernel: [ 3922.575322] ? nfsd_setuser_and_check_port+0x4a/0xc0 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575340] ? _fh_update.part.0.isra.0+0x4c/0x90 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575364] nfsd4_encode_getattr+0x2b/0x40 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575387] nfsd4_encode_operation+0xa6/0x2b0 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575407] nfsd4_proc_compound+0x1d0/0x700 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575426] nfsd_dispatch+0xe9/0x220 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575447] svc_process_common+0x2e7/0x650 [sunrpc]
Dec 3 08:54:24 kernel: [ 3922.575511] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575535] svc_process+0x12d/0x170 [sunrpc]
Dec 3 08:54:24 kernel: [ 3922.575560] svc_handle_xprt+0x448/0x580 [sunrpc]
Dec 3 08:54:24 kernel: [ 3922.575589] svc_recv+0x17a/0x2c0 [sunrpc]
Dec 3 08:54:24 kernel: [ 3922.575613] ? __pfx_nfsd+0x10/0x10 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575635] nfsd+0x84/0xb0 [nfsd]
Dec 3 08:54:24 kernel: [ 3922.575660] kthread+0xe0/0x100
Dec 3 08:54:24 kernel: [ 3922.575663] ? __pfx_kthread+0x10/0x10
Dec 3 08:54:24 kernel: [ 3922.575666] ret_from_fork+0x2c/0x50
Dec 3 08:54:24 kernel: [ 3922.575671] </TASK>
Dec 3 08:54:24 kernel: [ 3922.575672] ---[ end trace 0000000000000000 ]---
Try to run nfs protocol 4.0 or 4.1 only and check if you still have the same errors. We had strange behavior of nfs server, running on Rocky 9.4 with clients hanging when they try to mount with 4.2 protocol.(Clients also runs Rocky 9.4 )
Things seems better when we configure clients to mount shares with 4.1 or 4.0