Converted Rocky 8 from single SSD to MD RAID1 UEFI boot broke

Hello, I had a Rocky 8.6 system on a single NVME SSD. I installed a second identical NVME SSD.
I followed RHEL guide How to convert single disk EFI boot to software RAID after installation - Red Hat Customer Portal for the process.

But after completing the process and waiting for the MDRAID to rebuild, I rebooted the system and it just shows “System setup” as the only line in grub…

I think something in grub config must have gone not as expected?

Here are the steps from that document adjusted for this system:

populating /etc/mdadm.conf

#mdadm --examine --scan > /etc/mdadm.conf

#cat /etc/mdadm.conf

ARRAY /dev/md/0  metadata=1.0 UUID=73bd4068:03d23c38:17aff7cc:d6ae2150 name=HGX1:0
ARRAY /dev/md/1  metadata=1.0 UUID=71159c04:22880aed:01f2dfab:fd383f5c name=HGX1:1
ARRAY /dev/md/2  metadata=1.0 UUID=2607b270:a3efc2ba:3c7bf174:eb3d051e name=HGX1:2

Adding those UUIDs to /etc/default/grub

...
GRUB_CMDLINE_LINUX="rd.md.uuid=73bd4068:03d23c38:17aff7cc:d6ae2150  rd.md.uuid=71159c04:22880aed:01f2dfab:fd383f5c  rd.md.uuid=2607b270:a3efc2ba:3c7bf174:eb3d051e crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rd.driver.blacklist=nouveau"
...

Updating grub2.cfg

#grub2-editenv - list | grep kernelopts

kernelopts=root=/dev/mapper/rl-root ro crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rd.driver.blacklist=nouveau

#grub2-editenv - set "kernelopts=root=/dev/mapper/rl-root rd.md.uuid=73bd4068:03d23c38:17aff7cc:d6ae2150 rd.md.uuid=71159c04:22880aed:01f2dfab:fd383f5c rd.md.uuid=2607b270:a3efc2ba:3c7bf174:eb3d051e ro crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rd.driver.blacklist=nouveau"

#grub2-editenv - list | grep kernelopts

kernelopts=root=/dev/mapper/rl-root rd.md.uuid=73bd4068:03d23c38:17aff7cc:d6ae2150 rd.md.uuid=71159c04:22880aed:01f2dfab:fd383f5c rd.md.uuid=2607b270:a3efc2ba:3c7bf174:eb3d051e ro crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rd.driver.blacklist=nouvea

Updating EFI entries, first remove existing entry

#efibootmgr -v | grep Linux
Boot0000* Rocky Linux   HD(1,GPT,171f38a4-27bf-4091-a6da-7a9e90f44824,0x800,0x12c000)/File(\EFI\ROCKY\SHIMX64.EFI)

#efibootmgr -b 0 -B
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0015,0007,0004,0005,0006,0003,0008,0009,000A,000B,000C,000D,000E,0001
Boot0001* UEFI: Built-in EFI Shell
... (all kinds of PXE entries for all of the ports on this computer)
 

adding the new entries

#efibootmgr -c -d /dev/nvme0n1 -p1 -l \\EFI\\ROCKY\\SHIMX64.EFI -L "Rocky Linux"
BootOrder: 0000,0015,0007,0004,0005,0006,0003,0008,0009,000A,000B,000C,000D,000E,0001
Boot0000* Rocky Linux   HD(1,GPT,171f38a4-27bf-4091-a6da-7a9e90f44824,0x800,0x12c000)/File(\EFI\ROCKY\SHIMX64.EFI)
...

#efibootmgr -c -d /dev/nvme1n1 -p1 -l \\EFI\\ROCKY\\SHIMX64.EFI -L "Rocky Linux"
efibootmgr: ** Warning ** : Boot0000 has same label Rocky Linux
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0002,0000,0015,0007,0004,0005,0006,0003,0008,0009,000A,000B,000C,000D,000E,0001
Boot0000* Rocky Linux
Boot0001* UEFI: Built-in EFI Shell
...
Boot0015* UEFI OS
Boot0002* Rocky Linux

verifying whats there:

 #efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0002,0000,0015,0007,0004,0005,0006,0003,0008,0009,000A,000B,000C,000D,000E,0001
Boot0000* Rocky Linux   HD(1,GPT,171f38a4-27bf-4091-a6da-7a9e90f44824,0x800,0x12c000)/File(\EFI\ROCKY\SHIMX64.EFI)
Boot0001* UEFI: Built-in EFI Shell      VenMedia(5023b95c-db26-429b-a648-bd47664c8012)..BO
Boot0002* Rocky Linux   HD(1,GPT,fe3037df-8dd0-4187-95a5-b01ed0140262,0x800,0x12c000)/File(\EFI\ROCKY\SHIMX64.EFI)
Boot0003* UEFI: PXE: IPv4 Mellanox Network Adapter ...
...
Boot0015* UEFI OS       HD(1,GPT,171f38a4-27bf-4091-a6da-7a9e90f44824,0x800,0x12c000)/File(\EFI\BOOT\BOOTX64.EFI)..BO

Rebuilding initramfs with mdadmconf

#dracut -f -v --mdadmconf
...
dracut: *** Creating image file '/boot/initramfs-4.18.0-425.19.2.el8_7.x86_64.img' ***
dracut: *** Creating initramfs image file '/boot/initramfs-4.18.0-425.19.2.el8_7.x86_64.img' done ***

To me it looks like it is not grub what you are seeing. I think the two first uefi entries are broken and it skips to Boot0015* UEFI OS.
Can you boot from an iso an check those?

Yes, I can try that out, but would the UEFI OS also have grub> shell ?

If I hit “c” to get to the grub> shell,

I can type “set” and it shows all kinds of variables, a bunch of them I recognize from the settings above.

grub> set

...
prefix=(hd0,gpt1)/EFI/rocky
pxe_default_server=
root=hd0,gpt1
...

After much reading and poking around, I can get the system to boot by doing the following

grub> set root=hd0,gpt2
grub> normal

that causes the correct menu with the usual kernels listed and then system boots up without issues!

But I can’t find/figure out where the incorrect set root=hd0,gpt1 is coming from that prevents the grub menu from showing up in the first place!

So I am not sure how to fix this. My search results usually point to the kernelopts line

kernelopts=root=/dev/mapper/rl-root ...

but I think this is a different setting..?

If you included the esp partition in your raid setup then in effect you have changed its path as understood by the efi boot manager so it can’t be found.
If you enter the command:

efibootmgr -v

it will list the path it is searching for. I don’t use raid so I can’t advise on what the new path might look like, but if you read the man page for efibootmgr then you might get an idea what you need to do.

Check what does your grep "root" grub.cfg says. This should be setting the root for the /boot partition. Mine would look like this:

search --no-floppy --fs-uuid --set-root 6097d292-3d67-46e4-a221-84b31f7fb3ff

The UUID is the one of my /boot partition. If it does not align with yours than you can try regenerating your grub2.cfg then. It should actually be pretty straightforwardgrub2-mkconfig -o /etc/grub2-efi.cfg
You should beware though if you are experiencing the same issue as described in Grub Empty after dnf upgrade
I also have this issue on my RL8 VM and could not find any other solution than setting GRUB_ENABLE_BLSCFG=false in /etc/defaults/grub and regenerating the grub2.cfg.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.