Random Boot Issues with Rocky Linux 8.9 (Timing related?)

EagleTG · March 26, 2024, 3:58pm

Hello everyone!

I’m running into an issue that’s been driving me mad. I upgraded a fleet of similar workstation machines from Rocky Linux 8.7 to Rocky Linux 8.9 using the standard Yum process. After the upgrade, the machines will now randomly fail to boot. The boot failure shows ‘dracut-initqueue[pid]: Warning: dracut-initqueue timeout - starting timeout scripts’.

Every machine I’m having this issue on is the same hardware and software configuration (scripted via Kickstart/Ansible). The systems in-question are Dell Precision Workstation 3650 Towers with an NVMe boot SSD and a spinning data HDD. The NVMe is configured via LVM and has multiple partitions.

A reboot or two/three will eventually get the machine to boot properly. Frustratingly, as I’ve attempted to collect more information about what is failing, the addition of rd.shell rd.debug and removal of rhgb and quiet from boot parameters seem to completely fix the issue. This (and the random nature of the failure/success) makes me think this is some form of timing issue, but that’s a complete guess. I’ve also tried combinations of the rd.shell/rd.debug and removal of rhgb and quiet parameters. I have not been able to draw a correlation of exactly what combination fixes the issue, but I’ve also been thus far unable to collect additional details.

I’m also unable to collect serial debug information from Dracut as the machine lacks a serial port (and I’ve not yet found a work-around to this, since we are low-level, I assume a USB converter won’t work, do PCI Serial cards work?)

It seems to be ultimately failing on mounting /dev/mapper/vg0-root. Here’s a screenshot from a failed boot attempt:

I also tried modifying journald.conf to have Storage=persistent which allowed me to capture a failed boot and view it with journalctl -b, but the information collected was very similar to the last screenshot posted below, and looking in the logs previous to the reported failures did not yield any clues. It seems (unfortunately) that the cleanup tasks deleted this log since that time. I’ll work to capture another and post a follow-up.

Since I’m new here, I can’t post multiple pictures, but I’ll attempt to follow-up with a few more screenshots.

Any thoughts on anything else I can do to continue troubleshooting, or how do I collect more data?

Thanks!!

EagleTG · March 26, 2024, 3:58pm

And a zoomed-in version of the bottom part of the screen:

EagleTG · March 26, 2024, 3:59pm

While dracut is showing the initqueue-timeout, the following section(s) loop in the console:

gerry666uk · March 26, 2024, 11:07pm

After you got it to boot, were you able to scroll backward in /var/log/messages and see the entries from the previous boot, where it failed?

system · May 25, 2024, 11:07pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Random Boot Issues with Rocky Linux 8.9 (Continued) - Failure to mount LVM Root Rocky Linux Help & Support	5	185	September 22, 2024
Help migrating from CentOS 7.3.1611, Dracut-initqueue timeout on installation or live boot Rocky Linux Help & Support rocky-linux-8 , amd	22	10019	November 12, 2023
Very Strange Boot problem Rocky Linux Help & Support	1	484	August 25, 2023
Diagnosing boot failure Rocky Linux Help & Support rocky-linux-9	7	166	March 19, 2025
Can't boot Live Rocky 8 - Dracut-initqueue timeout - starting timeout scripts message loop Rocky Linux Help & Support	3	210	January 28, 2025

Random Boot Issues with Rocky Linux 8.9 (Timing related?)

Related topics