Struggling to restore GRUB menu entries on UEFI dual-boot machine

Hello, I’d love if someone could provide some help with fixing my RL9 installation.

A few days ago my GRUB broke for yet unknown reasons (I assume it might have been one of the Windows updates) and when trying to boot RL9 I received an error that mentioned something about corrupt volume. I then used Boot-Repair LiveCD to try and painlessly repair my GRUB but that unfortunately didn’t work. I unfortunately don’t remember the details but from the LiveCD I managed to restore my GRUB to a state where I could access the GRUB shell. After hours of research I managed to boot RL from grub shell with following commands:

grub> set root=(hd0,msdos3)
grub> linux /vmlinuz-[some version] root=/dev/mapper/rl-root
grub> initrd /initramfs-[some version].img
grub> boot

The problem that I am now facing is that this way is the only one I can boot my RL now and I’d like to not have to input these commands every day. I tried the following methods to try and restore the menu entries:

  1. Deleted the grub.cfg in both /boot/grub2 and /boot/EFI/rocky and ran dnf reinstall shim-* grub2-efi-* grub2-common from RL and from LiveCD.
  2. Added a new custom entry to /etc/grub.d/40_custom with the commands I’m now imputting manually to grub shell and rebuilt the config with grub2-mkconfig -o /boot/grub2/grub.cfg
  3. Added a new entry to efibootmgr pointing to shimx64.efi and grubx64.efi of the EFI partition.
  4. Manually edited the stub grub.cfg in EFI partition to point to the separate boot partition instead of rl-root partition.

After every attempt I rebooted my machine and, when selecting an EFI boot menu entry, expected to see GRUB with menu entries that could let me boot RL (or the ones that should load with blscfg from /boot/loader/entries), but none of these worked for me.

Here’s some potentially useful information about my machine:

[puzovoz@localhost ~]$ sudo lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0 223.6G  0 disk 
├─sda1        8:1    0 119.7G  0 part 
├─sda2        8:2    0   600M  0 part 
├─sda3        8:3    0     1G  0 part /boot
├─sda4        8:4    0     1K  0 part 
└─sda5        8:5    0 102.3G  0 part 
  ├─rl-swap 253:0    0    18G  0 lvm  [SWAP]
  ├─rl-home 253:1    0  53.8G  0 lvm  /home
  └─rl-root 253:2    0  30.5G  0 lvm  /
sdb           8:16   0 931.5G  0 disk 
└─sdb1        8:17   0 931.5G  0 part 
nvme0n1     259:0    0 238.5G  0 disk 
├─nvme0n1p1 259:1    0   579M  0 part 
└─nvme0n1p2 259:2    0 237.9G  0 part 
[puzovoz@localhost ~]$ sudo parted -l
Model: ATA WDC WDS240G2G0A- (scsi)
Disk /dev/sda: 240GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End    Size    Type      File system  Flags
 1      1049kB  129GB  129GB   primary   ntfs
 2      129GB   129GB  629MB   primary   fat32        boot
 3      129GB   130GB  1074MB  primary   xfs
 4      130GB   240GB  110GB   extended               lba
 5      130GB   240GB  110GB   logical                lvm


Model: ATA WDC WD10EZRZ-22H (scsi)
Disk /dev/sdb: 1000GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1000GB  1000GB  primary  ntfs         boot


Model: SAMSUNG MZVLQ256HBJD-00B00 (nvme)
Disk /dev/nvme0n1: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  608MB  607MB  primary  ntfs
 2      608MB   256GB  255GB  primary  ntfs         boot
[puzovoz@localhost ~]$ efibootmgr -uv
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,0000,0003,0008,000A,0009
Boot0000* Windows Boot Manager	HD(2,MBR,0x3bdf9fec,0x122000,0x1dbd0000)/File(\EFI\MICROSOFT\BOOT\BOOTMGFW.EFI)䥗䑎坏S
Boot0001* Rocky Linux	HD(2,MBR,0xf9e19a73,0xef6a800,0x12c000)/File(\EFI\ROCKY\SHIMX64.EFI)
Boot0003* UEFI: Built-in EFI Shell	VenMedia(5023b95c-db26-429b-a648-bd47664c8012)
Boot0008* UEFI OS	HD(2,MBR,0xf9e19a73,0xef6a800,0x12c000)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0009  Windows Boot Manager	HD(2,MBR,0xf9e19a73,0xef6a800,0x12c000)/File(\EFI\MICROSOFT\BOOT\BOOTMGFW.EFI)
Boot000A* Hard Drive	BBS(HD,,0x0)

I’d appreciate if someone could provide a way to restore GRUB without reinstalling the system as I don’t want to potentially have to reinstall again in the future in case something breaks again.

Can you post the contents of /etc/default/grub

If you are using Secureboot then you point it to shimx64.

One thing I can’t figure out is how UEFI works with msdos partitioned disks. I could never figure out how to force that mode when I tried it. When I installed my rl9 a year ago or so it was to a msdos disk in bios boot mode meaning I had to do a grub2-install to the mbr. I later coverted my disk to the gpt partition type and then did all the steps to set it up to boot UEFI, more than I can articulate here.
Just to understand better the current state of your firmware boot settings could you take a look in the firmware and see whether UEFI is enabled and/or legace (csm) for the boot methods?

Can you post the contents of /etc/default/grub

[puzovoz@localhost ~]$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_DISABLE_OS_PROBER=false

If you are using Secureboot then you point it to shimx64.

Yes but as far as I know even with disabled secure boot it should simply redirect control to grubx64.efi and I was just trying to see if at least something could work lol.
Also mentioning secure boot, I tried using the grub2-install utility but it completely refuses doing anything with EFI platforms because of secure boot.

[puzovoz@localhost ~]$ grub2-install
grub2-install: error: this utility cannot be used for EFI platforms because it does not support UEFI Secure Boot.

Even if secure boot is disabled (which was apparently in so called setup mode when I first tried repairing the system and then I completely disabled it in UEFI settings).

[puzovoz@localhost ~]$ mokutil --sb-state
SecureBoot disabled

Just to understand better the current state of your firmware boot settings could you take a look in the firmware and see whether UEFI is enabled and/or legace (csm) for the boot methods?

I checked UEFI settings and found that CSM is enabled. I’m not actually sure how that could affect me.

One thing I can’t figure out is how UEFI works with msdos partitioned disks.

When I saw that grub2-install error message I was wondering how my system is currently even working. For a bit of context, I had Windows 10 installed on this for a few years (I guess msdos partitioned) until I dual-boot installed RL9 on a partition on the same disk as the Windows system. I then later wanted to separate the two and bought a new drive where I tried to move the linux installation. Didn’t work, made many newbie mistakes in the process and had to reinstall linux in the end, at least I managed to rescue Windows without losing data. Before this I had no idea how any of this works so when I installed either Windows or RL I didn’t know the differences between partition tables, I probably would have chosen gpt if given the choice now. How RL managed to flawlessly install everything is a mystery to me and now I’d like to replicate what they did to restore my GRUB.

I understand now how your system boots in UEFI mode. Win10 will install esp on a msdos partition type disk and because that partition existed when you installed rl9 it also installed in UEFI mode. But we’re back to square one since I don’t see anything that would prevent a grub menu to be displayed.
Just to summarize; you can boot windows via the firmware menu. Selecting rocky on that same menu brings you to a blank screen or grub prompt. You have two entries in the efi boot manager for rocky that point to the same EFI/rocky folder on the esp. The grub.cfg …EFI/rocky is a stub file pointing to the UUID of your /boot partition where the real grub.cfg resides. You haven’t changed graphics card or monitor in this time period.
Recently, at least on fedora, an option was added to the grub stub file line that was something to the effect of “–root-device-only” preceding the “set_dev” command. This broke any system that had a certain type of raid install. I don’t see it in my stub file below:

search --no-floppy --fs-uuid --set=dev 7277c17d-6897-497b-9390-1153c9bff018
set prefix=($dev)/boot/grub2
export $prefix
configfile $prefix/grub.cfg

But it is worth checking for and removing if it appears in yours. I think it was quickly rescinded as it nolonger shows up in my latest stub file.
I can’t provide a solution so just throwing possibilities out there that can be checked off.

1 Like

Recently, at least on fedora, an option was added to the grub stub file line that was something to the effect of “–root-device-only” preceding the “set_dev” command.

Indeed, I do have an option in my grub.cfg like this, it’s --root-dev-only. I do not have a raid setup, but just in case I tried removing it to see if it could change anything. Tried rebooting but didn’t notice any changes.

The grub.cfg …EFI/rocky is a stub file pointing to the UUID of your /boot partition where the real grub.cfg resides.

Interesting, because my stub grub.cfg originally pointed to /rl-root partition rather than /boot. I tried changing the UUID to the one of /boot partition but that somehow doesn’t seem to be the solution either, even though it does sound like it makes lots of sense.

I’ll add some more info for a peer review of sorts, maybe I’m misunderstanding something:

[puzovoz@localhost ~]$ cat /run/media/puzovoz/AADE-8F12/EFI/rocky/grub.cfg
# this is the line that was in the config before I made these changes
# search --no-floppy --root-dev-only --fs-uuid --set=dev aaacc6f4-6ee6-4d6a-8d62-5641234898df
search --no-floppy --fs-uuid --set=dev e2fcbd9a-5374-4152-b66e-79dde914c9ce
set prefix=($dev)/boot/grub2
export $prefix
configfile $prefix/grub.cfg
[puzovoz@localhost ~]$ sudo blkid
/dev/mapper/rl-home: UUID="e80105d9-8432-4abe-aa34-131862501dc1" TYPE="xfs"
/dev/nvme0n1p1: LABEL="M-PM-^WM-PM-0M-QM-^@M-PM-5M-PM-7M-PM-5M-QM-^@M-PM-2M-PM-8M-QM-^@M-PM->M-PM-2M-PM-0M-PM-=M-PM-> M-QM-^AM-PM-8M-QM-^AM-QM-^BM-PM-5M-PM-<M-PM->M-PM-9" UUID="6638CF5C38CF29BF" TYPE="ntfs" PARTUUID="3bdf9fec-01"
/dev/nvme0n1p2: LABEL="system" UUID="9C2CD4472CD41E58" TYPE="ntfs" PARTUUID="3bdf9fec-02"
/dev/sdb2: UUID="AADE-8F12" TYPE="vfat" PARTUUID="f9e19a73-02"
/dev/sdb5: UUID="3pEbc7-ytca-2qmH-shZi-Zu9R-Z1sV-Q9F1XJ" TYPE="LVM2_member" PARTUUID="f9e19a73-05"
/dev/sdb3: UUID="e2fcbd9a-5374-4152-b66e-79dde914c9ce" TYPE="xfs" PARTUUID="f9e19a73-03"
/dev/sdb1: LABEL="ssd-storage" UUID="FC1434151433D17E" TYPE="ntfs" PARTUUID="f9e19a73-01"
/dev/mapper/rl-root: UUID="aaacc6f4-6ee6-4d6a-8d62-5641234898df" TYPE="xfs"
/dev/mapper/rl-swap: UUID="74f9c706-136b-48d3-a950-d99033fa0308" TYPE="swap"
/dev/sda1: LABEL="storage" UUID="AC68759368755D50" TYPE="ntfs" PARTUUID="0f79f3b0-01"

Well in my experience /boot has always been an ext4 file system because “lvm’s” are not readily readable by grub or bls. But for some reason it worked for you before a recent update. To me this is the issue. Why did it work but not now, I don’t know. I should have read your first post up thread more closely as this information was there.

Oh, the other thing that is missing is there is no device mapper entry for /boot on that partition like there is for rl-root and swap. That may be the issue.

Did you restore the device mapper entry for the boot partition? I think that is the problem (corruption) noted in your original post. I don’t know how to do that for lvm’s, never use them intentionally.

Device mapper has entries for things like LVM and iSCSI volumes.
The /boot cannot be an LVM volume, because GRUB can’t read LVM.
Therefore, the boot does not usually have entry in /dev/mapper.

GRUB can read XFS (and several other filesystems). Just not LVM.

Didn’t have the time to test anything new yet.

I think that is the problem (corruption) noted in your original post.

The corruption problem I’m mentioning in the original post is no longer relevant. It was the first error I encountered but it was resolved rather quickly after, what I assume, launching Boot-Repair utility.
But for reference, I found that I still had the error messages saved from back then:

Unexpected return from initial_read: Volume Corrupt, buffersize 1000
Failed to load image start_image() returned Volume Corrupt

When trying to boot the same EFI entry for the second time it returned a different but similar message:

Failed to open \EFI\ROCKY\grubx64.efi - Volume Corrupt
Failed to load image start_image() returned Volume Corrupt

I’m thinking of maybe somehow debugging GRUB to see why it’s failing to read the config that’s sitting there for it to read but I couldn’t find any ways to do so.

Another look a few days later at the config made it all so very obvious lol.
I used this Arch forum thread to assist with debugging: grub starts shell instead of the menu. / Installation / Arch Linux Forums
First I made sure that at least the first stub config file is loading correctly and checked the set variables. Then I tried to load config with configfile but I got a file not found error, which is what I expected, just wasn’t sure why exactly it can’t find the file. The problem was that the $prefix variable is trying to add another /boot directory to the mix. And since we changed the search to look for the boot partition directly, the /grub2 directory should be at root of our new $dev variable.

This is how the current config looks:

[puzovoz@localhost ~]$ cat /run/media/puzovoz/AADE-8F12/EFI/rocky/grub.cfg
#search --no-floppy --root-dev-only --fs-uuid --set=dev aaacc6f4-6ee6-4d6a-8d62-5641234898df
search --no-floppy --fs-uuid --set=dev e2fcbd9a-5374-4152-b66e-79dde914c9ce
set prefix=($dev)/grub2
export $prefix
configfile $prefix/grub.cfg

After adding this small change to the config file, it now all works correctly as I wanted it to! Thanks for helping and staying with me over these few days, I really appreciate it.

1 Like

Congrats, it makes sense once I compared the earlier stub file but it is not obvious.