Intermittent server reboot issue

Hello,
I am writing this to ask how to handle the issue below,

Around September 2023, Dell R750 server was introduced and I installed Rocky 8.6 OS on the server.
There is an issue that server continues to reboot intermittently .
But when i asked both HW and DB clustering companies regarding the issue,
They answered me that there is no problems.

Moreover, Judging that all CPU clocks are high Hyper-threading and virtualization options are turned off.

also, There is a history of reboot on the tsr log below fyi,

=========================================
DB Unit 1
2023-12-03 20:50:34 System CPU Resetting.
2023-12-01 11:15:13 System CPU Resetting.
2023-11-30 09:31:25 System CPU Resetting.

DB Unit 2
2024-01-20 09:31:25 System CPU Resetting.

=========================================

My question is, shoud i update the kernal by any chance?

  • Kernal Version: 4.18.0-372.9.1.e18
  • Using as 2 MariaDB servers
  • Lifekeeper (DB Clustering)

I am looking forward to your replay.
Thanks in advance!

DB unit1 dmesg

[3016178.877786] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4Modules linked in: binfmt_misc bonding tls nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink vfat fat dm_service_time dm_queue_length dm_multipath intel_rapl_msr intel_rapl_common ipmi_ssif i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt dell_smbios crct10dif_pclmul iTCO_vendor_support wmi_bmof dell_wmi_descriptor crc32_pclmul mgag200 ghash_clmulni_intel dcdbas i2c_algo_bit rapl intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_mbox_pci sysimgblt cdc_ether isst_if_mmio fb_sys_fops intel_uncore pcspkr usbnet mei_me acpi_ipmi isst_if_common drm mii i2c_i801 mei ipmi_si wmi intel_pmt acpi_power_meter xfs libcrc32c sd_mod sg lpfc nvmet_fc nvmet nvme_fc ahci libahci nvme_fabrics megaraid_sas crc32c_intel bnxt_en nvme_core libata tg3 t10_pi scsi_transport_fc dm_mirror dm_region_hash dm_log dm_mod
[3016178.877818] ipmi_devintf ipmi_msghandler
[3016178.877819] CPU: 4 PID: 4394 Comm: lcm Kdump: loaded Not tainted 4.18.0-372.9.1.el8.x86_64 #1
[3016178.877820] Hardware name: Dell Inc. PowerEdge R750/0216NK, BIOS 1.10.2 03/03/2023
[3016178.877821] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877821] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
[3016178.877822] RSP: 0018:ff523a94089978a0 EFLAGS: 00000002
[3016178.877823] RAX: 0000000000200101 RBX: ff2757919be54000 RCX: 0000000000000004
[3016178.877823] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff2757ac7ff2ae40
[3016178.877824] RBP: ff2757ac7ff2ae40 R08: ff2757ac7ff2a760 R09: ff27578d80400270
[3016178.877824] R10: 0000000000000000 R11: ffffffffa3a5b548 R12: 0000000000000000
[3016178.877825] R13: ff2757919be54bbc R14: 0000000000000087 R15: 0000000000000004
[3016178.877825] FS: 000014aa82835f40(0000) GS:ff2757ac7ff00000(0000) knlGS:0000000000000000
[3016178.877826] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3016178.877826] CR2: 00007f9908011b10 CR3: 00000001fd130001 CR4: 0000000000771ee0
[3016178.877827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3016178.877827] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3016178.877828] PKRU: 55555554
[3016178.877828] Call Trace:
[3016178.877828] _raw_spin_lock+0x1a/0x20
[3016178.877829] try_to_wake_up+0x15d/0x510
[3016178.877829] __queue_work+0x13d/0x3e0
[3016178.877830] queue_work_on+0x34/0x40
[3016178.877830] soft_cursor+0x194/0x220
[3016178.877830] bit_cursor+0x3d2/0x610
[3016178.877831] ? bit_putcs+0x550/0x550
[3016178.877831] ? fbcon_cursor+0xff/0x170
[3016178.877832] hide_cursor+0x2a/0xa0
[3016178.877832] vt_console_print+0x3bd/0x400
[3016178.877833] console_unlock+0x35f/0x4a0
[3016178.877833] vprintk_emit+0x14d/0x250
[3016178.877833] printk+0x58/0x6f
[3016178.877834] __warn_printk+0x46/0x87
[3016178.877834] ? sched_clock+0x5/0x10
[3016178.877834] update_blocked_averages+0x6af/0x6e0
[3016178.877835] ? entry_SYSCALL_64_after_hwframe+0xb8/0xca
[3016178.877835] newidle_balance+0xcb/0x3c0
[3016178.877836] ? __switch_to_asm+0x41/0x70
[3016178.877836] pick_next_task_fair+0x3e/0x3b0
[3016178.877836] __schedule+0x146/0x830
[3016178.877837] ? hrtimer_start_range_ns+0x1b6/0x2c0
[3016178.877837] schedule+0x35/0xa0
[3016178.877838] do_nanosleep+0x81/0x160
[3016178.877838] hrtimer_nanosleep+0x89/0x110
[3016178.877838] ? hrtimer_init_sleeper+0x90/0x90
[3016178.877839] __x64_sys_nanosleep+0x96/0xd0
[3016178.877839] do_syscall_64+0x5b/0x1a0
[3016178.877839] entry_SYSCALL_64_after_hwframe+0x65/0xca
[3016178.877840] RIP: 0033:0x14aa80e2d648
[3016178.877840] Code: 48 2c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c5 0f 1f 00 f3 0f 1e fa 48 8d 05 55 9f 2c 00 8b 00 85 c0 75 17 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 50 c3 0f 1f 80 00 00 00 00 55 48 89 f5 53 48
[3016178.877841] RSP: 002b:00007ffcab4d4db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
[3016178.877842] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000014aa80e2d648
[3016178.877843] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcab4d4ed0
[3016178.877843] RBP: 00007ffcab4d4dd0 R08: 0000000000000000 R09: 000014aa80eb3820
[3016178.877844] R10: 0000000000000008 R11: 0000000000000246 R12: 000000000000001d
[3016178.877844] R13: 0000000088888889 R14: 00007ffcab4d4ed0 R15: 00007ffcab4d6490
[3016178.877845] Kernel panic - not syncing: Hard LOCKUP
[3016178.877845] CPU: 4 PID: 4394 Comm: lcm Kdump: loaded Not tainted 4.18.0-372.9.1.el8.x86_64 #1
[3016178.877846] Hardware name: Dell Inc. PowerEdge R750/0216NK, BIOS 1.10.2 03/03/2023
[3016178.877846] Call Trace:
[3016178.877846]
[3016178.877847] dump_stack+0x41/0x60
[3016178.877847] panic+0xe7/0x2ac
[3016178.877848] nmi_panic.cold.11+0xc/0xc
[3016178.877848] watchdog_overflow_callback.cold.7+0x5c/0x70
[3016178.877848] __perf_event_overflow+0x52/0xf0
[3016178.877849] handle_pmi_common+0x1f7/0x2d0
[3016178.877849] ? __set_pte_vaddr+0x32/0x50
[3016178.877850] ? __native_set_fixmap+0x24/0x30
[3016178.877850] intel_pmu_handle_irq+0xeb/0x410
[3016178.877850] perf_event_nmi_handler+0x2d/0x50
[3016178.877851] nmi_handle+0x63/0x110
[3016178.877851] default_do_nmi+0x49/0x100
[3016178.877852] do_nmi+0x1af/0x220
[3016178.877852] end_repeat_nmi+0x16/0x6f
[3016178.877853] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877853] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
[3016178.877854] RSP: 0018:ff523a94089978a0 EFLAGS: 00000002
[3016178.877855] RAX: 0000000000200101 RBX: ff2757919be54000 RCX: 0000000000000004
[3016178.877855] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff2757ac7ff2ae40
[3016178.877856] RBP: ff2757ac7ff2ae40 R08: ff2757ac7ff2a760 R09: ff27578d80400270
[3016178.877856] R10: 0000000000000000 R11: ffffffffa3a5b548 R12: 0000000000000000
[3016178.877857] R13: ff2757919be54bbc R14: 0000000000000087 R15: 0000000000000004
[3016178.877857] ? native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877858] ? native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877858]
[3016178.877858] _raw_spin_lock+0x1a/0x20
[3016178.877859] try_to_wake_up+0x15d/0x510
[3016178.877859] __queue_work+0x13d/0x3e0
[3016178.877860] queue_work_on+0x34/0x40
[3016178.877860] soft_cursor+0x194/0x220
[3016178.877860] bit_cursor+0x3d2/0x610
[3016178.877861] ? bit_putcs+0x550/0x550
[3016178.877861] ? fbcon_cursor+0xff/0x170
[3016178.877861] hide_cursor+0x2a/0xa0
[3016178.877862] vt_console_print+0x3bd/0x400
[3016178.877862] console_unlock+0x35f/0x4a0
[3016178.877863] vprintk_emit+0x14d/0x250
[3016178.877863] printk+0x58/0x6f
[3016178.877863] __warn_printk+0x46/0x87
[3016178.877864] ? sched_clock+0x5/0x10
[3016178.877864] update_blocked_averages+0x6af/0x6e0
[3016178.877864] ? entry_SYSCALL_64_after_hwframe+0xb8/0xca
[3016178.877865] newidle_balance+0xcb/0x3c0
[3016178.877865] ? __switch_to_asm+0x41/0x70
[3016178.877865] pick_next_task_fair+0x3e/0x3b0
[3016178.877866] __schedule+0x146/0x830
[3016178.877866] ? hrtimer_start_range_ns+0x1b6/0x2c0
[3016178.877867] schedule+0x35/0xa0
[3016178.877867] do_nanosleep+0x81/0x160
[3016178.877867] hrtimer_nanosleep+0x89/0x110
[3016178.877868] ? hrtimer_init_sleeper+0x90/0x90
[3016178.877868] __x64_sys_nanosleep+0x96/0xd0
[3016178.877868] do_syscall_64+0x5b/0x1a0
[3016178.877869] entry_SYSCALL_64_after_hwframe+0x65/0xca
[3016178.877869] RIP: 0033:0x14aa80e2d648
[3016178.877870] Code: 48 2c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c5 0f 1f 00 f3 0f 1e fa 48 8d 05 55 9f 2c 00 8b 00 85 c0 75 17 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 50 c3 0f 1f 80 00 00 00 00 55 48 89 f5 53 48
[3016178.877871] RSP: 002b:00007ffcab4d4db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
[3016178.877872] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000014aa80e2d648
[3016178.877872] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcab4d4ed0
[3016178.877873] RBP: 00007ffcab4d4dd0 R08: 0000000000000000 R09: 000014aa80eb3820
[3016178.877873] R10: 0000000000000008 R11: 0000000000000246 R12: 000000000000001d
[3016178.877874] R13: 0000000088888889 R14: 00007ffcab4d4ed0 R15: 00007ffcab4d6490

DB unit2 dmesg

[3016178.877786] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4Modules linked in: binfmt_misc bonding tls nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink vfat fat dm_service_time dm_queue_length dm_multipath intel_rapl_msr intel_rapl_common ipmi_ssif i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt dell_smbios crct10dif_pclmul iTCO_vendor_support wmi_bmof dell_wmi_descriptor crc32_pclmul mgag200 ghash_clmulni_intel dcdbas i2c_algo_bit rapl intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_mbox_pci sysimgblt cdc_ether isst_if_mmio fb_sys_fops intel_uncore pcspkr usbnet mei_me acpi_ipmi isst_if_common drm mii i2c_i801 mei ipmi_si wmi intel_pmt acpi_power_meter xfs libcrc32c sd_mod sg lpfc nvmet_fc nvmet nvme_fc ahci libahci nvme_fabrics megaraid_sas crc32c_intel bnxt_en nvme_core libata tg3 t10_pi scsi_transport_fc dm_mirror dm_region_hash dm_log dm_mod
[3016178.877818] ipmi_devintf ipmi_msghandler
[3016178.877819] CPU: 4 PID: 4394 Comm: lcm Kdump: loaded Not tainted 4.18.0-372.9.1.el8.x86_64 #1
[3016178.877820] Hardware name: Dell Inc. PowerEdge R750/0216NK, BIOS 1.10.2 03/03/2023
[3016178.877821] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877821] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
[3016178.877822] RSP: 0018:ff523a94089978a0 EFLAGS: 00000002
[3016178.877823] RAX: 0000000000200101 RBX: ff2757919be54000 RCX: 0000000000000004
[3016178.877823] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff2757ac7ff2ae40
[3016178.877824] RBP: ff2757ac7ff2ae40 R08: ff2757ac7ff2a760 R09: ff27578d80400270
[3016178.877824] R10: 0000000000000000 R11: ffffffffa3a5b548 R12: 0000000000000000
[3016178.877825] R13: ff2757919be54bbc R14: 0000000000000087 R15: 0000000000000004
[3016178.877825] FS: 000014aa82835f40(0000) GS:ff2757ac7ff00000(0000) knlGS:0000000000000000
[3016178.877826] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3016178.877826] CR2: 00007f9908011b10 CR3: 00000001fd130001 CR4: 0000000000771ee0
[3016178.877827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3016178.877827] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3016178.877828] PKRU: 55555554
[3016178.877828] Call Trace:
[3016178.877828] _raw_spin_lock+0x1a/0x20
[3016178.877829] try_to_wake_up+0x15d/0x510
[3016178.877829] __queue_work+0x13d/0x3e0
[3016178.877830] queue_work_on+0x34/0x40
[3016178.877830] soft_cursor+0x194/0x220
[3016178.877830] bit_cursor+0x3d2/0x610
[3016178.877831] ? bit_putcs+0x550/0x550
[3016178.877831] ? fbcon_cursor+0xff/0x170
[3016178.877832] hide_cursor+0x2a/0xa0
[3016178.877832] vt_console_print+0x3bd/0x400
[3016178.877833] console_unlock+0x35f/0x4a0
[3016178.877833] vprintk_emit+0x14d/0x250
[3016178.877833] printk+0x58/0x6f
[3016178.877834] __warn_printk+0x46/0x87
[3016178.877834] ? sched_clock+0x5/0x10
[3016178.877834] update_blocked_averages+0x6af/0x6e0
[3016178.877835] ? entry_SYSCALL_64_after_hwframe+0xb8/0xca
[3016178.877835] newidle_balance+0xcb/0x3c0
[3016178.877836] ? __switch_to_asm+0x41/0x70
[3016178.877836] pick_next_task_fair+0x3e/0x3b0
[3016178.877836] __schedule+0x146/0x830
[3016178.877837] ? hrtimer_start_range_ns+0x1b6/0x2c0
[3016178.877837] schedule+0x35/0xa0
[3016178.877838] do_nanosleep+0x81/0x160
[3016178.877838] hrtimer_nanosleep+0x89/0x110
[3016178.877838] ? hrtimer_init_sleeper+0x90/0x90
[3016178.877839] __x64_sys_nanosleep+0x96/0xd0
[3016178.877839] do_syscall_64+0x5b/0x1a0
[3016178.877839] entry_SYSCALL_64_after_hwframe+0x65/0xca
[3016178.877840] RIP: 0033:0x14aa80e2d648
[3016178.877840] Code: 48 2c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c5 0f 1f 00 f3 0f 1e fa 48 8d 05 55 9f 2c 00 8b 00 85 c0 75 17 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 50 c3 0f 1f 80 00 00 00 00 55 48 89 f5 53 48
[3016178.877841] RSP: 002b:00007ffcab4d4db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
[3016178.877842] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000014aa80e2d648
[3016178.877843] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcab4d4ed0
[3016178.877843] RBP: 00007ffcab4d4dd0 R08: 0000000000000000 R09: 000014aa80eb3820
[3016178.877844] R10: 0000000000000008 R11: 0000000000000246 R12: 000000000000001d
[3016178.877844] R13: 0000000088888889 R14: 00007ffcab4d4ed0 R15: 00007ffcab4d6490
[3016178.877845] Kernel panic - not syncing: Hard LOCKUP
[3016178.877845] CPU: 4 PID: 4394 Comm: lcm Kdump: loaded Not tainted 4.18.0-372.9.1.el8.x86_64 #1
[3016178.877846] Hardware name: Dell Inc. PowerEdge R750/0216NK, BIOS 1.10.2 03/03/2023
[3016178.877846] Call Trace:
[3016178.877846]
[3016178.877847] dump_stack+0x41/0x60
[3016178.877847] panic+0xe7/0x2ac
[3016178.877848] nmi_panic.cold.11+0xc/0xc
[3016178.877848] watchdog_overflow_callback.cold.7+0x5c/0x70
[3016178.877848] __perf_event_overflow+0x52/0xf0
[3016178.877849] handle_pmi_common+0x1f7/0x2d0
[3016178.877849] ? __set_pte_vaddr+0x32/0x50
[3016178.877850] ? __native_set_fixmap+0x24/0x30
[3016178.877850] intel_pmu_handle_irq+0xeb/0x410
[3016178.877850] perf_event_nmi_handler+0x2d/0x50
[3016178.877851] nmi_handle+0x63/0x110
[3016178.877851] default_do_nmi+0x49/0x100
[3016178.877852] do_nmi+0x1af/0x220
[3016178.877852] end_repeat_nmi+0x16/0x6f
[3016178.877853] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877853] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
[3016178.877854] RSP: 0018:ff523a94089978a0 EFLAGS: 00000002
[3016178.877855] RAX: 0000000000200101 RBX: ff2757919be54000 RCX: 0000000000000004
[3016178.877855] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff2757ac7ff2ae40
[3016178.877856] RBP: ff2757ac7ff2ae40 R08: ff2757ac7ff2a760 R09: ff27578d80400270
[3016178.877856] R10: 0000000000000000 R11: ffffffffa3a5b548 R12: 0000000000000000
[3016178.877857] R13: ff2757919be54bbc R14: 0000000000000087 R15: 0000000000000004
[3016178.877857] ? native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877858] ? native_queued_spin_lock_slowpath+0x5b/0x1b0
[3016178.877858]
[3016178.877858] _raw_spin_lock+0x1a/0x20
[3016178.877859] try_to_wake_up+0x15d/0x510
[3016178.877859] __queue_work+0x13d/0x3e0
[3016178.877860] queue_work_on+0x34/0x40
[3016178.877860] soft_cursor+0x194/0x220
[3016178.877860] bit_cursor+0x3d2/0x610
[3016178.877861] ? bit_putcs+0x550/0x550
[3016178.877861] ? fbcon_cursor+0xff/0x170
[3016178.877861] hide_cursor+0x2a/0xa0
[3016178.877862] vt_console_print+0x3bd/0x400
[3016178.877862] console_unlock+0x35f/0x4a0
[3016178.877863] vprintk_emit+0x14d/0x250
[3016178.877863] printk+0x58/0x6f
[3016178.877863] __warn_printk+0x46/0x87
[3016178.877864] ? sched_clock+0x5/0x10
[3016178.877864] update_blocked_averages+0x6af/0x6e0
[3016178.877864] ? entry_SYSCALL_64_after_hwframe+0xb8/0xca
[3016178.877865] newidle_balance+0xcb/0x3c0
[3016178.877865] ? __switch_to_asm+0x41/0x70
[3016178.877865] pick_next_task_fair+0x3e/0x3b0
[3016178.877866] __schedule+0x146/0x830
[3016178.877866] ? hrtimer_start_range_ns+0x1b6/0x2c0
[3016178.877867] schedule+0x35/0xa0
[3016178.877867] do_nanosleep+0x81/0x160
[3016178.877867] hrtimer_nanosleep+0x89/0x110
[3016178.877868] ? hrtimer_init_sleeper+0x90/0x90
[3016178.877868] __x64_sys_nanosleep+0x96/0xd0
[3016178.877868] do_syscall_64+0x5b/0x1a0
[3016178.877869] entry_SYSCALL_64_after_hwframe+0x65/0xca
[3016178.877869] RIP: 0033:0x14aa80e2d648
[3016178.877870] Code: 48 2c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c5 0f 1f 00 f3 0f 1e fa 48 8d 05 55 9f 2c 00 8b 00 85 c0 75 17 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 50 c3 0f 1f 80 00 00 00 00 55 48 89 f5 53 48
[3016178.877871] RSP: 002b:00007ffcab4d4db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
[3016178.877872] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000014aa80e2d648
[3016178.877872] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcab4d4ed0
[3016178.877873] RBP: 00007ffcab4d4dd0 R08: 0000000000000000 R09: 000014aa80eb3820
[3016178.877873] R10: 0000000000000008 R11: 0000000000000246 R12: 000000000000001d
[3016178.877874] R13: 0000000088888889 R14: 00007ffcab4d4ed0 R15: 00007ffcab4d6490

Rocky 8.9 is latest iteration of the 8.x release. If you are on 8.6 you should probably apply updates first and see if the problem goes away. Also note, 8.6 is no longer supported, only the current release is supported, so 8.9.

Other than that, suggest running memtest on it, which requires rebooting the machine since there is every possibility that there is a hardware fault. Also check BIOS and/or any firmware and see if it requires updating on any of the components in the server.

1 Like

To add to iwalker’s post:

Definitely update your BIOS from 1.10.2 to 1.12.1, as both 1.12.1 and its predecessor, 1.11.2, was released back in late 2023, marked with an “Importance” of URGENT.

(While you’re at that, it’s probably not a bad idea to update the iDRAC 9 firmware to the latest (7.00.60.00) as well…)

I’m fairly certain this is a problem with a bad stick of RAM, having seen this kind of issue before on another PowerEdge running CentOS 7, years ago. Replacing a bad DIMM fixed the issue.
Running MemTest on the server can help verify that part.

Good luck!

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.