We have noticed that on our AWS environments since the 9.4 ami was published our servers do not boot up after a kernel/dnf update which is part of our provisioning
AMI id: ami-02d6d64456f57b99d
AMI Name: Rocky-9-EC2-Base-9.4-20240509.0.x86_64
Region: eu-west-2
Disk type: gp3
Instance type: t3.medium
The serial console snippet:
[ 15.746077] nvme nvme0: pci function 0000:00:04.0
[ 15.746470] ACPI: \_SB_.LNKD: Enabled at IRQ 11
[ 15.770394] nvme nvme0: 2/0/0 default/read/poll queues
[ 15.775054] nvme0n1: p1 p2 p3 p4
[ 143.289253] dracut-initqueue[429]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 143.336391] dracut-initqueue[429]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2floop0p4.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 143.436561] dracut-initqueue[429]: [ -e "/dev/mapper/loop0p4" ]
[ 143.465303] dracut-initqueue[429]: fi"
[ 143.482303] dracut-initqueue[429]: Warning: dracut-initqueue: starting timeout scripts
[ 143.845722] dracut-initqueue[429]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 143.892666] dracut-initqueue[429]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2floop0p4.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 143.992287] dracut-initqueue[429]: [ -e "/dev/mapper/loop0p4" ]
[ 144.022260] dracut-initqueue[429]: fi"
[ 144.038265] dracut-initqueue[429]: Warning: dracut-initqueue: starting timeout scripts
[ 144.398278] dracut-initqueue[429]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 144.444311] dracut-initqueue[429]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2floop0p4.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 144.541977] dracut-initqueue[429]: [ -e "/dev/mapper/loop0p4" ]
<snip - same message repeated>
[ 209.332749] dracut-initqueue[429]: Warning: dracut-initqueue: starting timeout scripts
[ 209.332777] dracut-initqueue[429]: Warning: Could not boot.
Starting Dracut Emergency Shell...
Warning: /dev/mapper/loop0p4 does not exist
Generating "/run/initramfs/rdsosreport.txt"
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.
dracut:/#
I am new to the forum so will try figure out how to attach the whole console log
We are in the process of investigating this. We’re not entirely sure why /dev/loop0 is being referenced (it starts at /etc/kernel/cmdline). Before updating an instance, you could change that file to make sure the root matches whatever is in /proc/cmdline at that time, which should technically get around that issue. However, we want to try to make sure you nor anyone else has to do that.
I’ll also point out we do provide LVM variants of our cloud images. The LVM variants of our images are not affected by this, if you wanted to try it out or use it.
Thank you for pointing this out, we’ll be working on it throughout the day and hopefully have an idea for a fix soon.
As an update, @neil and myself have drilled down to the primary issue that was causing this. Due to this, we will be rebuilding and reissuing all of the cloud images for 9.4 on all applicable architectures. This process can take a while, but when we have it all done, we will notify here when they’re good to go.
Thank you all for your patience and apologies for this unfortunate (and rather strange) inconvenience.
This is undoubtedly a “me” problem but I’m only seeing 9.3.20231113 in the AWS Marketplace. I might be doing it wrong.
edit: Remembered the old “search for 792107900819” so ignore me. Unless that is also sub-optimal.
Indeed if you take the 9.3 AMI, then do a yum/dnf update, it will update to 9.4 and reboots without issues.
One thing that I noticed is that the 9.4 AMI is not mentionned on Download - Rocky Linux
It only exists on AWS public images with the Rocky Linux account “792107900819”
The “aws-marketplace” account owner “679593333241” doesn’t have the 9.4 AMI, probably because the 9.4 AMI provided by Rocky Linux team is faulty, thus not passing AWS validation.
I’m now thinking if I should use aws 679593333241 instead of rocky 792107900819 for my packer and terraform datasources, if that can free me from some troubles in the future.
Just to note (more for search probably, I see you say it affects all non-uefi cloud images) same issue on OpenStack using Rocky-9-GenericCloud-Base-9.4-20240509.0.x86_64.qcow2
Apologies for the delays. We are also dealing with 8.10 work at the same time as this. We have built new images, we just need to upload, test, and publish. Our initial tests with the generic cloud images show that updates should no longer break the images.
Ok having stumbled on that new image I just tested it; looks like the rebuild didn’t pull in the new packages as it still wants to update the kernel, but after doing a dnf update * I can confirm that image does not hangs after a reboot.
I noticed that grub reconfiguration was not triggered after kernel packages update as per usual.
Fixed reboot issue on a faulty Rocky-9-EC2-Base-9.4-20240509.0.x86_64 AMI by:
The 792107900819 account is missing the latest aarch64 image . Currently showing Rocky-9-EC2-Base-9.4-20240509.0.aarch64. There is Rocky-9-EC2-Base-9.4-20240523.0.x86_64 for x86_64 image, which I think resolved this issue.