Rocky Linux 9.7 Won't Boot Post Kernel Upgrade - lvm /dev/rl_nv/root does not exist - boots fine on older kernel

Like the title says, the latest kernel does not boot on my Dell c1100 CS24-TY 1U server. Here is the dracut emergency log:

Kernel version 5.14.0-570.33.2.el9_6.x86_64 - 9.6 will boot, detect my lvm partitions, and works fine.

Kernel version 5.14.0-570.58.1.el9_6.x86_64 - 9.6 will not boot claiming it can’t find my lvm partitions.

Kernel version 5.14.0-611.1.el9_7.x86_64 - 9.7 will not boot claiming it can’t find my lvm partitions.

Any idea? I also changed the way the lvm partitions are detected by changing

use_devicesfile = 0 in /etc/lvm/lmv.conf and regenerated dracut -f –all

Ran this through Gemini, and I still wasn’t able to get the newer kernel images to work…

This conversation log documents the systematic troubleshooting of a boot failure in Rocky Linux 9.7 (Kernel 5.14.0-611) on legacy Intel ICH10R (IMSM) hardware. While Kernel 9.6 boots successfully, 9.7 fails to assemble the RAID 10 array, preventing the LVM root volume from mounting.


Troubleshooting Log: Linux Kernel 9.7 & MDRAID/IMSM Failure

1. Initial Assessment: Software vs. Metadata

  • Observation: The system drops to a Dracut emergency shell on Kernel 9.7.

  • Verification: lsinitrd confirmed that mdraid, lvm, ahci.ko, and mdadm.conf are present in the 9.7 initramfs.

  • Metadata: mdadm.conf contains correct UUIDs and the AUTO +imsm directive.

  • Finding: The failure is not due to missing software, but a failure in the initialization handshake between the kernel and the RAID metadata.

2. Driver Conflict & Platform Checks

  • Hypothesis: Driver mismatch (VMD vs. AHCI) or strict BIOS platform checks.

  • Test: Added imsm_no_platform=1, vmd.max_devices=0, and rd.driver.pre=ahci to GRUB.

  • Result: Failed. The 9.6 kernel uses the ahci driver successfully, but forcing it in 9.7 does not trigger assembly.

3. Dracut Shell Diagnostics

  • Manual Assembly: Running mdadm --assemble --scan --run --force in the shell resulted in an active /dev/md127 (Container) but failed to start the member volume /dev/md126.

  • Error: Metadata reports /dev/md126 “has been assembled with 1 device but cannot be started.”

  • Analysis: RAID 10 requires at least 2 disks to start. The kernel is successfully reading metadata from sda but failing to “claim” sdb, sdc, and sdd.

4. Identification of the “Partition Lock” Conflict

  • Critical Observation: cat /proc/partitions shows raw partitions (sda1, sda2, sdb1, etc.) on the physical disks.

  • Diagnosis: The 9.7 kernel is performing a partition scan on the raw disks before the MDRAID assembly. Once the kernel “touches” a partition like sda1, it places a lock on the physical disk. When mdadm tries to claim the disk for the RAID container, it returns a “Device or resource busy” state.

  • Verification: mdadm --examine /dev/sda shows all disks as “Active/Online,” proving the metadata is intact but the disks are locked by the kernel partition manager.

5. Attempted Overrides

  • Unlock Attempt: Used partx -d /dev/sda to manually remove partition mappings and release kernel locks.

  • Result: Even after clearing partitions and stopping zombie arrays, the member volume /dev/md126 fails to spawn from the container /dev/md127.

  • Conclusion: There is a fundamental change in the block device discovery order or udev race condition in Kernel 9.7 that prevents Intel IMSM volumes from initializing on ICH10R chipsets when legacy partition tables are present.


Hardware & Environment Summary

Component Specification
Server Dell CS24-TY (Quanta 897a)
Chipset Intel ICH10R SATA Controller [RAID mode]
RAID Level RAID 10 (Intel IMSM / Matrix Storage)
Working Kernel 5.14.0-570 (Rocky 9.6)
Failing Kernel 5.14.0-611 (Rocky 9.7)
Storage Stack Physical Disks → IMSM Container → MD-Volume → LVM

Longer conversation log (not formatted as nicely as above): Full Conversation Log: Linux Kernel & MDRAID Boot DiagnosticsUser: Older ker - Pastebin.com — Any ideas?