Kernel is tainted - memory management and kernel crashes

sarudy89 · June 10, 2025, 8:11am

Hello, would someone be able to help.

Issue: during idle or normal work load PC goes into safe shutdown by itself. Then turns back on.

Clean install Rocky Linux 9.6.
Issue happens on Kernel-5.14.0-570.18.1.el9_6.x86_64
9950x3D, 4090 rtx, 186 GB RAM, no XPO.
Latest Bios update
Nvidia drivers 550, RPM Fusion
CPU stress test fine. GPU stress test fine. Memtest all good.
Temps under 100% load: CPU - 90°, GPU - 63°

Three crash dumps - Vmcore-dmesg.txt and kexec-dmesg.log interpreted by Claude.ai:

The crash occurred in the LRU (Least Recently Used) page management code during filesystem unmounting. Specifically:

Location: lru_gen_del_folio.constprop.0+0x12f/0x1a0

Trigger: __list_del_entry_valid+0x2d/0x50

Address: Invalid memory address 0xefffe53885663608

What Was Happening

The system was shutting down normally

During filesystem unmount, the kernel was trying to clean up memory pages

When attempting to remove a page from the LRU list, it encountered corrupted list pointers

The list validation detected an invalid memory address and crashed

Likely Causes

Most Probable:

Memory corruption from one of the tainted kernel modules:

nvidia driver (proprietary)

vmmon (VMware module)

Other out-of-tree modules

Other Possibilities:

Hardware memory issues (RAM corruption)

Race condition in the kernel’s multi-generational LRU code

Filesystem corruption during shutdown

Evidence Supporting This

The kernel is tainted (Tainted: P OE) due to proprietary/out-of-tree modules

The crash happens in memory management, which proprietary drivers often interfere with

The invalid pointer value suggests memory corruption rather than a logic error

After I have switched to kernel 5.14.0-570.17.1.el9_6.x86_64 PC works without crash. Why is the issue happening with newer kernel?

iwalker · June 10, 2025, 9:14am

I feel there is something wrong with that kernel. On my HP Proliant DL360e Gen8 that kernel doesn’t work either, or at least for a few hours and then crashes on me. Currently using the 9.5 kernel (5.14.0-503.40.1.el9_5.x86_64) as this one works fine. Didn’t try the previous 570.17 kernel like you have, but I guess that might also work on mine.

I don’t know if I have the same errors as you, but I expect as the behaviour is similar I don’t think it will be problems with your hardware, but rather something in this kernel is borked.

The 570-19 kernel is available in Rocky 9 now, perhaps try this one, I’m going to do the same now by running dnf update.

I’m also contemplating enabling elrepo and installing kernel-lt or kernel-ml since they are 6.x kernels of which I know work on this server since I was using either Debian 12 or Ubuntu with a 6.x kernel before switching to Rocky 9.

iwalker · June 10, 2025, 9:39am

Actually server died on the 570.19 kernel as well when performing a restic backup to one of my VM’s. So looks like either an elrepo kernel or the 9.5 one.

EDIT: so far kernel-lt from elrepo is working fine. No crashes so far. Worst case I’ll end up returning to 5.14.0-503.40.1.

sarudy89 · June 10, 2025, 11:53am

thank for looking into it. I have installed kernel-lt and test to see how things are going. I must say, I have spent 2 days debugging this issue. I hope all will be fine for now. thx!

sarudy89 · June 10, 2025, 3:35pm

@iwalker one question, out of curiosity. Is this standard procedure to install different kernels for Rocky based on the hardware configuration to see what works and what not?

Since I am using latest 9950x3D CPU I ended up with elrepo 6.14 kernels because I kept having crashes because of the 3D cache on kernel-lt.

Also I could not use the newest elrepo 6.15 kernel because it was too new for Nvidia drivers. So 6.14 seems to be a good middle ground for me.

I am just surprised that I have to be so selective about kernels and nvidia drivers in order not to break things and have the hardware work.

iwalker · June 10, 2025, 3:49pm

You have to remember, EL is Enterprise Linux, and there is a certified hardware list that Red Hat makes the distro for. Since Rocky is based on RHEL, that means the hardware support for the stock kernels is the same. If you have new hardware, especially if something exotic with the absolutely latest CPU, or whatever, then sure there can be issues since it’s most likely not on the hardware compatibility list that Red Hat built the distro for. In which case you need to use the kernel-lt or kernel-ml from elrepo. For example, AMD Ryzen CPU’s would have the CPU fan going at full whack on older kernels than for say with a newer one when support for the CPU was provided.

EL or Enterprise Linux is built for stability, and that doesn’t necessarily mean that the latest and greatest hardware is supported. In this instance, you are probably better off with Fedora or as already mentioned a newer kernel but still use EL.

I run Fedora 42 on my Lenovo Thinkpad T15p.

iwalker · June 10, 2025, 3:59pm

Incidently, Rocky 10 is due out soon with a 6.12 kernel, so may work better for you.

sarudy89 · June 10, 2025, 4:05pm

Super thank you. That clears out the confusion I had. I appreciate the explanation

system · August 9, 2025, 4:05pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kernel panic ryzen 9 7950x on reboot or shutdown Rocky Linux Help & Support	2	730	August 25, 2023
Kernel 5.14.0-362.13.1.el9_3 broken? Rocky Linux Help & Support rocky-linux-9	18	2725	February 16, 2024
Downgrading to Kernel 5.10.0 from 5.14 on Rocky Linux 9.1 Rocky Linux Help & Support	7	3538	August 25, 2023
Blocking BUG with the new RockyLinux 9.2 kernel Rocky Linux Help & Support	23	8752	December 27, 2023
Cpu disabled by guest operating system, Unwanted reboot issue on rocky linux 9.2 Rocky Linux Help & Support rocky-linux-9	9	2118	May 12, 2024

Kernel is tainted - memory management and kernel crashes

What Was Happening

Likely Causes

Evidence Supporting This

Related topics