Cannot boot latest 8.10 kernel on two of my 100+ Rocky 8 machines

Over the weekend I updated over 100+ servers and workstations (mostly Dell Poweredge servers or Precision workstations) from Rocky 8.9 to 8.10. For the vast majority of these machines everything went fine. But on two of the, both Precision 5820, the new kernel 4.18.0-553.5.1.el8_10 shows in the GRUB menu but when selected on just gets a blank screen forever with never any output. The system also never appears online at its configured IP. If we select the previous kernel on each box they boot and run just fine.

I tried reinstalling all the kernel packages with

yum reinstall $(rpm -qa | grep 553.5.1)

but same result on reboot. I tried ‘dracut -fv’ and ‘grub2-mkconfig …’ as well with no change.

BTW, I have removed ‘quiet’ from the grub boot parameters so I normally see kernel messages immediately on selecting the older kernel in GRUB. We see nothing with the new kernel.

Any ideas what might be going on?

While booted into the previous kernel, are you able to see any messages at all from the attempt to boot into the newer kernel?

No, nothing in the /var/log/messages at all from those attempts. Since I see no kernel boot messages I am pretty sure it never gets far enough for the filesystems to be mounted and for any logs to get written too.

I get no error message from GRUB either, so it seems to find the kernel file to boot, but something in booting in the first few microseconds locks the system up.

Are these the only two machines which are Precision 5820, or are there others of this type for which it’s working ? If they are the only ones, what output do you get from
lspci -nn | grep VGA
if any ?

I have several other Precision 5820’s but after looking at the difference the two machines with the boot issue are both running a BIOS dated from 01/24/2019. All the rest that work have a later BIOS. So I am going to try upgrading the BIOS later this week.

While they are still in the broken state, can you try; either edit existing grub entry, or create a new one.

remove : "rhgb quiet" (already done)
add : "console=tty0 console=ttyS0,115200 rd.debug initcall_debug"

reboot, then choose the “debug” menu entry. On my machine, I got huge extra detail during boot, BUT

Are the machines using UEFI and secure boot?

If so, it might be getting stuck running shim…

Tried adding “rd.debug initcall_debug” but still got just a blank screen on boot. We upgraded the box to the latest BIOS version from Dell and still just a blank screen. We then tried booting the 8.10 install USB image to see if that would boot and it did not.

So we have given up at this point and will just stick to the older kernel on these boxes till there is a newer Rocky 8 kernel to try.