Boot failed after upgrade from Rocky 9.4 to 9.5

emmett · December 27, 2024, 4:08pm

Last weekend I ran dnf update on four VMs and their host, All of which were on Rocky 9.4

I first verified that two of the VMs successfully booted after running the upgrade, and both worked fine. So I upgraded the two other VMs, and the host.

As usual I caused all VMs to shutdown before rebooting the host.

The host never came backup. Did I mention that the hardware is colocated 40 miles away?

I was not able to ssh into or ping the host, nor of course, any of the VMs.

I tried power cycling the host, just in case that might fix the issue. It didn’t.

On site I found the host failed boot and dumped into the failure shell. So I caused it to reboot and noted that two LVM volumes were not detected. Two were detected and mounted, as was the software raid hard drive. LVM is used on a 1TB SSD (nvme0n1).

The /var and /home volumes were not recognized by the start up process, nor could I find them when logging into the maint shell.

I assumed SSD failed, though I couldn’t see how that was possible, as some of the voumes seemed to work fine.

So I then tried booting from the previous kernel, and lo and behold, the host and all VMs booted. That of course means there is nothing wrong with the SSD.

I’ve removed the “bad” kernel, 5.14.0-503, so now the host boots correctly to 5.14.0-427, without intervention. When I removed the kernel and its associated files, dnf also uninstalled kmod-kvdo and vdo as dependencies. I don’t know if that is related, but I’d never seen those packages before and thought I’d mention them.

Where do I look to help determine what happened? Did this, or something like it, happen to anyone else.

I don’t know enough to create a bug report and will definitely do that once I know more.

jbkt23 · December 27, 2024, 5:08pm

What is the output of this command?

sudo grubby --info=DEFAULT

Did prior kernel updates go without issue or is 9.4 the base install and never updated till now?

Also if you have any added kernel parameters you have to run this command before booting into the new kernel:

grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg

This post here on the forum may be of interest:
https://forums.rockylinux.org/t/rocky-linux-9-5-sometimes-boot-into-emergency-mode-for-no-reason/16655

emmett · December 27, 2024, 6:20pm

No output from

grubby --info=DEFAULT

from shell as root. I don’t normally use sudo.

The host started with a fresh install of Rocky 9.1 and was updated every couple of weeks until it got to 9.4. For each update the host booted as expected after running reboot in a root shell.

I waited three or four weeks after the last update at Rocky 9.4 to see if there were any issues with the upgrade to 9.5. Then I first updated and rebooted two of the VMs to verify all was well. But when it took the host server to Rocky 9.5, reboot failed as described in the original post…

As far as I know there are no added kernel parameter. For sure none that I added.

The post you mentioned was “close”, but not enough. This host is hardware colocated at a daacanter. Its main job is to host four VMs, and its other job is as our primary DNS server. It also hosts our backup image archives.

Because the data center where the host resides is 40+ miles away, and because it hosts virtually all of our internet facing infrastructure, it will be difficult to debug simply by adjusting boot parameters and rebooting to see if it successfully reboots. But if that’s what it takes, I’ll do that, but only after I have some clues on what to look for when it fails to boot.

Will there be any logs that remain after I removed the “bad” kernel that could shed some light on what exatcly happened with the attempt to boot the “bad” kernel?

Can you suggest how I can determine why the “bad” kernel didn’t see two of the LVM volumes that the previous kernel has no porblem seeing, even after multiple reboots. Did it have anything to do with vdo?

I need to find a way I can find out if a new kernel will reboot, before I actually reboot after a kernel update. Any sugesstion on that?

Is this something specific to Rocky 9.5, or to the kernel. I suspect it’s not about Rocky 9.5, as simply booting from the previous kernel while the system is fully updated to Rocky 9.5 works as expected.

emmett · December 27, 2024, 6:22pm

Did you mena that I should run

grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg

after update and before rebooting?

frankie · December 27, 2024, 7:34pm

well thread starter from the linked thread here. i guess its closer than you think because what i did not mention in that post was that it started as a rocky 9.4 vm and sucessfully upgraded to 9.5 without any issues. these just came after a 9.5 kernel upgrade (on 9.5) was installed.

you should be able to list old boot logs via:

journalctl --list-boots

jbkt23 · December 27, 2024, 8:19pm

Yes, but if you have no added kernel parameters then there’s no need to run that command especially since this all worked fine w/o doing so on all the previous kernel updates.

Try grubby --info=ALL

The other suggestion I have is to read the release notes for the Rocky Version 9.5 update that can be found in the documentation drop down list.

emmett · December 27, 2024, 10:00pm

All four of the VMs I upgraded from 9.4 to 9.5 booted up fine immediately after the upgrade to that VM. It was only the host (hardware) that failed to reboot.

From what you wrote, had I upgraded before the 503 kernel was released, I might not have experienced the boot failure then, but would have eventually, when the 503 kernel was released and deployed on my hardware.

Which tells me the issue has something to do with the 503 kernel.

When I do your suggested joutnalctl --list-boots command on the currently running server I get:

IDX BOOT ID FIRST ENTRY LAST ENTRY
0 02bb85b853d34f7da3ff5ffd67d81fa9 Sun 2024-12-22 15:56:01 PST Fri 2024-12-27 13:14:19 PST

Unfortunately the failed boot happened on the 21st, so the earliest listed is when I successfuly booted the 427 kernel the next day. Indeed, doing journalctl --boot=0 shows me the boot log from the 427 kernal boot.

I guess that means I cannot see any boot logs from when the 503 kernel failed to boot. At least not via journalctl.

I’ll consider reinstalling the 503 kernel while on site, then look to capture the boot log when/if it fails to reboot. If it does fail, and I expect it will, I’ll remove that kernel again and keep looking for way to discover why.

emmett · December 27, 2024, 10:08pm

Running grubby --info=ALL lists only the two 427 and rescue kernel files. Which I would expect, as I removed the 503 kernel.

I will reinstall the 503 kernel and run the grubby info command again. Then remove it again until I figure out why it fails. I’ll share tthe before and after grubby results if it looks that info may shed some light.

I didn’t see anything that looks related to my issue in the release notes, and thanks for the suggestion.

frankie · December 28, 2024, 12:20pm

I see. Have the exact problem. My vm only lists the current boot.

ive applied that soulution here: kernel - Why does `journalctl --list-boots` only show the current boot? - Ask Ubuntu

yes im well aware thats an 7yr old post and a solution for ubuntu but it works

jbkt23 · December 28, 2024, 1:18pm

Here, there are nine posts and we still don’t know anything more about your system setup than the first post which is about nil. How is your system partitioned, how do you do your updates, via ssh and command line or via cockpit web console, and what is on your kernel command line?
Posting output from forensic commands in code blocks, the button above </>, is how it helps us understand. So post the output of these commands:

grubby --info=ALL

lsblk -o name,fstype,uuid,mountpoint

less /etc/fstab

What vm tool are you using?

Another thought that came to me is maybe the problem kernel did not complete its install, I’ve had that happen using the cockpit interface, but it does take a minute or two for kernel scripts to run.

emmett · December 28, 2024, 7:52pm

Ffter removing 503 (bad) kernel:
grubby --info=ALL

index=0
kernel=“/boot/vmlinuz-5.14.0-427.37.1.el9_4.x86_64”
args=“ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet $tuned_params”
root=“/dev/mapper/cl-root”
initrd=“/boot/initramfs-5.14.0-427.37.1.el9_4.x86_64.img $tuned_initrd”
title=“Rocky Linux (5.14.0-427.37.1.el9_4.x86_64) 9.4 (Blue Onyx)”
id=“4a3876baf4f14bcebefd80a1cda9197f-5.14.0-427.37.1.el9_4.x86_64”
index=1
kernel=“/boot/vmlinuz-5.14.0-427.20.1.el9_4.x86_64”
args=“ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet $tuned_params”
root=“/dev/mapper/cl-root”
initrd=“/boot/initramfs-5.14.0-427.20.1.el9_4.x86_64.img $tuned_initrd”
title=“Rocky Linux (5.14.0-427.20.1.el9_4.x86_64) 9.4 (Blue Onyx)”
id=“4a3876baf4f14bcebefd80a1cda9197f-5.14.0-427.20.1.el9_4.x86_64”
index=2
kernel=“/boot/vmlinuz-0-rescue-4a3876baf4f14bcebefd80a1cda9197f”
args=“ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet”
root=“/dev/mapper/cl-root”
initrd=“/boot/initramfs-0-rescue-4a3876baf4f14bcebefd80a1cda9197f.img”
title=“Rocky Linux (0-rescue-4a3876baf4f14bcebefd80a1cda9197f) 9.3 (Blue Onyx)”
id=“4a3876baf4f14bcebefd80a1cda9197f-0-rescue”

lsblk -o name,fstype,uuid,mountpoint:

NAME FSTYPE sda └─md127 ├─store-backup xfs └─store-virt_images xfs sdb └─md127 ├─store-backup xfs └─store-virt_images xfs sdc └─md127 ├─store-backup xfs └─store-virt_images xfs sdd └─md127 ├─store-backup xfs └─store-virt_images xfs nvme0n1
├─nvme0n1p1 vfat ├─nvme0n1p2 ext4 └─nvme0n1p3 ├─cl-root xfs ├─cl-swap swap ├─cl-home xfs └─cl-var xfs UUID MOUNTPOINT
linux_raid_member a4a8a4d4-9a6f-bea3-d723-26db50f64dc1
LVM2_member HsyG0X-cvYP-ztJS-gSbm-mTUe-4qId-bKchcq
739429c4-7409-49fb-bef9-9ea0afdf4b32 /backup
b14df60f-69eb-4a24-8a1b-2428254dc443 /var/lib/libvirt/images
linux_raid_member a4a8a4d4-9a6f-bea3-d723-26db50f64dc1
LVM2_member HsyG0X-cvYP-ztJS-gSbm-mTUe-4qId-bKchcq
739429c4-7409-49fb-bef9-9ea0afdf4b32 /backup
b14df60f-69eb-4a24-8a1b-2428254dc443 /var/lib/libvirt/images
linux_raid_member a4a8a4d4-9a6f-bea3-d723-26db50f64dc1
LVM2_member HsyG0X-cvYP-ztJS-gSbm-mTUe-4qId-bKchcq
739429c4-7409-49fb-bef9-9ea0afdf4b32 /backup
b14df60f-69eb-4a24-8a1b-2428254dc443 /var/lib/libvirt/images
linux_raid_member a4a8a4d4-9a6f-bea3-d723-26db50f64dc1
LVM2_member HsyG0X-cvYP-ztJS-gSbm-mTUe-4qId-bKchcq
739429c4-7409-49fb-bef9-9ea0afdf4b32 /backup
b14df60f-69eb-4a24-8a1b-2428254dc443 /var/lib/libvirt/images
31EB-D37E /boot/efi
82c2eef3-5fdd-453d-ab95-dce74c9ea2f4 /boot
LVM2_member 5jCBd9-U14w-mg2R-opiF-nP8T-Iozw-dKpt6b
98f87ea1-ae61-45b9-acb3-4550aac94473 /
f9acb95f-40dd-4ab7-990d-33bd9795da7f [SWAP]
201060b7-de95-4903-9d2b-25077f2ac9a6 /home
139f6e5c-5dd2-4804-a908-5ebc1e1d7264 /var

less /etc/fstab

/dev/mapper/cl-root / xfs defaults 0 0
/dev/mapper/store-backup /backup xfs defaults 0 0
UUID=82c2eef3-5fdd-453d-ab95-dce74c9ea2f4 /boot ext4 defaults 1 2
UUID=31EB-D37E /boot/efi vfat umask=0077,shortname=winnt 0 2
/dev/mapper/cl-home /home xfs defaults 0 0
/dev/mapper/cl-var /var xfs defaults 0 0
/dev/mapper/store-virt_images /var/lib/libvirt/images xfs defaults 0 0
/dev/mapper/cl-swap none swap defaults 0 0

kinfo from vnc connection

Operating System: Rocky Linux 9.5
KDE Plasma Version: 5.27.11
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.9
Kernel Version: 5.14.0-427.37.1.el9_4.x86_64 (64-bit)
Graphics Platform: offscreen
Processors: 32 × AMD EPYC 7302P 16-Core Processor
Memory: 125.2 GiB of RAM
Graphics Processor: llvmpipe

pvscan
PV /dev/md127 VG store lvm2 [<1.82 TiB / 0 free]
PV /dev/nvme0n1p3 VG cl lvm2 [929.92 GiB / 651.29 GiB free]
Total: 2 [<2.73 TiB] / in use: 2 [<2.73 TiB] / in no VG: 0 [0 ]

lvscan
ACTIVE ‘/dev/store/backup’ [1.33 TiB] inherit
ACTIVE ‘/dev/store/virt_images’ [500.00 GiB] inherit
ACTIVE ‘/dev/cl/root’ [40.00 GiB] inherit
ACTIVE ‘/dev/cl/home’ [20.00 GiB] inherit
ACTIVE ‘/dev/cl/var’ [200.00 GiB] inherit
ACTIVE ‘/dev/cl/swap’ [<18.63 GiB] inherit

--------------------
I seem to remember when I did the pvscan from the maint shell when the boot faild on kernel 503, that the second line didn’t exist. However the lvscan listed both the swap and root volumes, but not the var and home volumes.

I used the command “dnf update” from an SSH remote shell to update to Rocky 9.5. The update completed without errors, just as it had on the four VMs that machine hosts.

cat /proc/cmdline

BOOT_IMAGE=(hd4,gpt2)/vmlinuz-5.14.0-427.37.1.el9_4.x86_64 root=/dev/mapper/cl-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet

Unfortunately I didn’t know to record the kernel command before removing the 503 kernel and rebooting.

jbkt23 · December 29, 2024, 1:10pm

From your original post:

On site I found the host failed boot and dumped into the failure shell. So I caused it to reboot and noted that two LVM volumes were not detected. Two were detected and mounted, as was the software raid hard drive. LVM is used on a 1TB SSD (nvme0n1).

I can only guess that for some un explained reason the root= of your commandline did not get written to the new kernel entry.

One way to insure that this is written correctly is to create the /etc/kernel/cmdline file via the method I suggested up thread using grub2-mkconfig.
Only way to confirm all is working on the new kernel install is to be there at the server. I would not install the new kernel remotely until sure it booted correctly.

emmett · December 29, 2024, 4:13pm

The root volumn was available in the maint shell. That and swap were both found and mounted. Ugh.

I did test the upgrade in each VM, alll without a problem. Except for the upgrade to 9.5 on the host (hardware), all upgrades to my Rocky instances have gone flawlessly. So I was quite surprised when this one failed.

I plan to do what you suggest and do the next update on that machine while on-site. Then before booting do all the checks suggested here, and of course run the grub make config command you suggested.

My plan is to collect all the info I can from the miant shell if it fails again, before removing the “bad” kernel and rebooting.

I think it likely that this was an anomoly and all will be well the next time. For sure, from now on I will do updates to the host only when I can get access to the data center. Probaly make an appointment with the tech team there to have them be ready to help.

Thanks for all the comments and suggestions.

anonamoose · January 20, 2025, 4:23am

Good luck Emmett. Sounds like some bad luck. I’ve upgraded Centos and Rocky Linux dozens of time just by updating the yum repos and running yum/dnf update without a hitch. I hope you figure it out.

system · March 21, 2025, 4:24am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

emmett · June 10, 2025, 6:03pm

Setting use_devicesfile = 0 in /etc/lvm/lvm.conf fixed my problem.

Before I found this solution I updated to two other newer kernels and had the same issue.

This is one of the articles that led me to attempt this fix:

There were others, but this on was close enough to my particular problem to cause me to try it.

Thanks for reopening this ticket so I could share the solution.

Topic		Replies	Views
Ssd host boot fails after upgrade to latest Rocky 9 Rocky Linux Help & Support rocky-linux-9 , dell	2	790	September 30, 2023
Rocky 9.5 doesn't boot since I forced Nvidia updates Rocky Linux Help & Support	7	374	November 22, 2024
Rocky 9 kernel-5.14.0-70.17.1.el9_0.x86_64 System falls to boot Rocky Linux Help & Support	5	1666	August 25, 2023
After update to 9.2 Boot issues Rocky Linux Help & Support	9	3011	August 10, 2023
Yum update in Rocky 9 upgrades to Rocky 9.1 and breaks startup Rocky Linux Help & Support	4	3069	August 25, 2023

Boot failed after upgrade from Rocky 9.4 to 9.5

Related topics