Hi
Since the upgrade to RockyLinux 9.5 our linux servers (VM or physical) frequently hard reboot or the nfs-server is just stopping. I already reinstalled one server, this server was stable over the weekend with zero/light NFS4.2 traffic. Today when more is going on it crashed already three times. Between one crash there was just 20min… With RockyLinux 9.4 everything was rock(y) stable with the same config that i use now.
Any idea what that could be?
[251118.198708] perf: interrupt took too long (2524 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[256533.825978] perf: interrupt took too long (3166 > 3155), lowering kernel.perf_event_max_sample_rate to 63000
[279965.977293] perf: interrupt took too long (3965 > 3957), lowering kernel.perf_event_max_sample_rate to 50000
[326621.176722] ------------[ cut here ]------------
[326621.176728] WARNING: CPU: 18 PID: 3270 at mm/slab_common.c:957 free_large_kmalloc+0x5a/0x80
[326621.176739] Modules linked in: tls binfmt_misc dm_service_time iscsi_tcp libiscsi_tcp libiscsi rpcrdma rdma_cm iw_cm ib_cm ib_core scsi_transport_iscsi nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif dm_mod kvm dell_wmi_descriptor sparse_keymap rfkill video iTCO_wdt rapl intel_cstate mxm_wmi mei_me dcdbas mei intel_uncore iTCO_vendor_support ipmi_si joydev acpi_power_meter ipmi_devintf ipmi_msghandler pcspkr lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg mgag200 uas usb_storage drm_kms_helper ahci libahci drm_shmem_helper crct10dif_pclmul crc32_pclmul drm ixgbe crc32c_intel libata
[326621.176795] igb ghash_clmulni_intel megaraid_sas i2c_algo_bit mdio dca wmi fuse
[326621.176801] CPU: 18 PID: 3270 Comm: nfsd Kdump: loaded Not tainted 5.14.0-503.14.1.el9_5.x86_64 #1
[326621.176804] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.19.0 12/12/2023
[326621.176806] RIP: 0010:free_large_kmalloc+0x5a/0x80
[326621.176811] Code: da 9c 5b fa be 06 00 00 00 48 89 ef e8 af 25 0a 00 80 e7 02 74 01 fb 48 83 c4 08 44 89 e6 48 89 ef 5b 5d 41 5c e9 d6 28 04 00 <0f> 0b 45 31 e4 80 3d 43 0e fc 01 00 ba 00 f0 ff ff 0f 84 fb 9a 90
[326621.176813] RSP: 0018:ffffb6f9092bb968 EFLAGS: 00010246
[326621.176815] RAX: 0017ffffc0001000 RBX: ffffffff8c31e2e0 RCX: ffff94a6c40f9220
[326621.176816] RDX: fffff42b8cb69608 RSI: ffffffff8b058378 RDI: fffff42b8cb69600
[326621.176818] RBP: fffff42b8cb69600 R08: ffffffff8ca06440 R09: ffff94a9afc744b0
[326621.176819] R10: 00000000000003c8 R11: 0000000000000000 R12: ffffffff8b058378
[326621.176820] R13: 0000000000000000 R14: ffff94a68942ae00 R15: ffff94a9e567c000
[326621.176822] FS: 0000000000000000(0000) GS:ffff94a9afc40000(0000) knlGS:0000000000000000
[326621.176824] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[326621.176825] CR2: 00005626630c5140 CR3: 000000032f410001 CR4: 00000000003706f0
[326621.176827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[326621.176828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[326621.176829] Call Trace:
[326621.176831] <TASK>
[326621.176833] ? show_trace_log_lvl+0x1c4/0x2df
[326621.176839] ? show_trace_log_lvl+0x1c4/0x2df
[326621.176842] ? security_release_secctx+0x28/0x40
[326621.176846] ? free_large_kmalloc+0x5a/0x80
[326621.176849] ? __warn+0x7e/0xd0
[326621.176852] ? free_large_kmalloc+0x5a/0x80
[326621.176855] ? report_bug+0x100/0x140
[326621.176859] ? handle_bug+0x3c/0x70
[326621.176862] ? exc_invalid_op+0x14/0x70
[326621.176864] ? asm_exc_invalid_op+0x16/0x20
[326621.176868] ? lookup_dcache+0x18/0x60
[326621.176872] ? lookup_dcache+0x18/0x60
[326621.176875] ? free_large_kmalloc+0x5a/0x80
[326621.176878] ? lookup_dcache+0x18/0x60
[326621.176880] security_release_secctx+0x28/0x40
[326621.176883] nfsd4_encode_fattr4+0x2cc/0x4f0 [nfsd]
[326621.176955] ? avc_has_perm_noaudit+0x94/0x110
[326621.176959] ? selinux_inode_permission+0x10e/0x1d0
[326621.176964] ? __d_lookup+0x73/0xb0
[326621.176967] ? d_lookup+0x29/0x50
[326621.176969] ? lookup_dcache+0x18/0x60
[326621.176972] nfsd4_encode_entry4_fattr+0xcd/0x1e0 [nfsd]
[326621.177019] nfsd4_encode_entry4+0x100/0x290 [nfsd]
[326621.177072] nfsd_buffered_readdir+0x144/0x250 [nfsd]
[326621.177114] ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
[326621.177170] ? __pfx_nfsd_buffered_filldir+0x10/0x10 [nfsd]
[326621.177211] ? __pfx_nfsd4_encode_entry4+0x10/0x10 [nfsd]
[326621.177255] nfsd_readdir+0xa9/0xe0 [nfsd]
[326621.177296] nfsd4_encode_readdir+0xf8/0x1d0 [nfsd]
[326621.177341] nfsd4_encode_operation+0xa6/0x2b0 [nfsd]
[326621.177386] nfsd4_proc_compound+0x1d0/0x700 [nfsd]
[326621.177446] nfsd_dispatch+0xe9/0x220 [nfsd]
[326621.177487] svc_process_common+0x2e7/0x650 [sunrpc]
[326621.177583] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[326621.177623] svc_process+0x12d/0x170 [sunrpc]
[326621.177691] svc_handle_xprt+0x448/0x580 [sunrpc]
[326621.177750] svc_recv+0x17a/0x2c0 [sunrpc]
[326621.177819] ? __pfx_nfsd+0x10/0x10 [nfsd]
[326621.177858] nfsd+0x84/0xb0 [nfsd]
[326621.177896] kthread+0xe0/0x100
[326621.177900] ? __pfx_kthread+0x10/0x10
[326621.177904] ret_from_fork+0x2c/0x50
[326621.177919] </TASK>
[326621.177920] ---[ end trace 0000000000000000 ]---
[326621.177922] object pointer: 0x00000000e53caba2
[326621.179321] BUG: unable to handle page fault for address: ffff94a86da58000
[326621.179324] #PF: supervisor write access in kernel mode
[326621.179327] #PF: error_code(0x0003) - permissions violation
[326621.179330] PGD 330801067 P4D 330801067 PUD 100207063 PMD 800000032da000a1
[326621.179337] Oops: 0003 [#1] PREEMPT SMP PTI
[326621.179341] CPU: 18 PID: 3270 Comm: nfsd Kdump: loaded Tainted: G W ------- --- 5.14.0-503.14.1.el9_5.x86_64 #1
[326621.179345] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.19.0 12/12/2023
[326621.179347] RIP: 0010:svc_process_common+0xe7/0x650 [sunrpc]
[326621.179466] Code: 00 00 48 c7 87 80 02 00 00 00 00 00 00 48 29 d0 48 c1 f8 03 c1 e0 0c 89 87 cc 02 00 00 4c 89 e7 e8 ce a9 00 00 48 85 c0 74 02 <89> 18 be 04 00 00 00 4c 89 e7 e8 ba a9 00 00 48 85 c0 74 06 c7 00
[326621.179468] RSP: 0018:ffffb6f9092bbe28 EFLAGS: 00010286
[326621.179470] RAX: ffff94a86da58000 RBX: 000000000bc19c07 RCX: ffff94a86da58000
[326621.179471] RDX: ffff94a9e567c2e8 RSI: 0000000000000004 RDI: ffff94a9e567c238
[326621.179472] RBP: ffff94a9e567c000 R08: ffff94a9e567c1a0 R09: 0000000000000000
[326621.179473] R10: 0000000000000006 R11: 0000000000001000 R12: ffff94a9e567c238
[326621.179474] R13: ffff94a9c23f8f00 R14: ffff94a9c23f8784 R15: ffff94a9e567c000
[326621.179475] FS: 0000000000000000(0000) GS:ffff94a9afc40000(0000) knlGS:0000000000000000
[326621.179477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[326621.179478] CR2: ffff94a86da58000 CR3: 000000032f410001 CR4: 00000000003706f0
[326621.179479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[326621.179480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[326621.179481] Call Trace:
[326621.179483] <TASK>
[326621.179484] ? show_trace_log_lvl+0x1c4/0x2df
[326621.179488] ? show_trace_log_lvl+0x1c4/0x2df
[326621.179492] ? svc_process+0x12d/0x170 [sunrpc]
[326621.179547] ? __die_body.cold+0x8/0xd
[326621.179551] ? page_fault_oops+0x134/0x170
[326621.179554] ? kernelmode_fixup_or_oops+0x84/0x110
[326621.179557] ? exc_page_fault+0xa8/0x150
[326621.179561] ? asm_exc_page_fault+0x22/0x30
[326621.179565] ? svc_process_common+0xe7/0x650 [sunrpc]
[326621.179621] ? svc_process_common+0xe2/0x650 [sunrpc]
[326621.179678] svc_process+0x12d/0x170 [sunrpc]
[326621.179736] svc_handle_xprt+0x448/0x580 [sunrpc]
[326621.179796] svc_recv+0x17a/0x2c0 [sunrpc]
[326621.179856] ? __pfx_nfsd+0x10/0x10 [nfsd]
[326621.179896] nfsd+0x84/0xb0 [nfsd]
[326621.179936] kthread+0xe0/0x100
[326621.179940] ? __pfx_kthread+0x10/0x10
[326621.179943] ret_from_fork+0x2c/0x50
[326621.179947] </TASK>
[326621.179948] Modules linked in: tls binfmt_misc dm_service_time iscsi_tcp libiscsi_tcp libiscsi rpcrdma rdma_cm iw_cm ib_cm ib_core scsi_transport_iscsi nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif dm_mod kvm dell_wmi_descriptor sparse_keymap rfkill video iTCO_wdt rapl intel_cstate mxm_wmi mei_me dcdbas mei intel_uncore iTCO_vendor_support ipmi_si joydev acpi_power_meter ipmi_devintf ipmi_msghandler pcspkr lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg mgag200 uas usb_storage drm_kms_helper ahci libahci drm_shmem_helper crct10dif_pclmul crc32_pclmul drm ixgbe crc32c_intel libata
[326621.179989] igb ghash_clmulni_intel megaraid_sas i2c_algo_bit mdio dca wmi fuse
[326621.179994] CR2: ffff94a86da58000