System won't boot with missing mirror disk?

I built a new machine with two new SSDs. From the installer I picked RAID1 for /boot and / (and no swap). And the system installed cleanly and upon reboot I can see /proc/mdstat showing the two mirror partitions.

So far so good.

But, of course, a mirror isn’t helpful if the system won’ t boot with a bad disk, so that was my next test. Rebooted with one disk missing.

Grub displayed as normal (yay) and then the kernel started booting… but the OS never completed. With “rhgb” and “quiet” removed we see it getting so far then stopping.

Pressing C-A-D a few times and eventually I see a “waiting for start job…disk/byuid/…” for many minutes.

In the attached picture, the line at [7.712616] logitech-hidpp.... is where the boot hangs and where I press C-A-D. The following lines are all that show after that.

Now on a normal boot…

Kernel command line: BOOT_IMAGE=(mduuid/49e9de 19e56e702edda216dee6a7ee85)/vmlinuz-4.18.0-348.2.1.el8_5.x86_64 root=UUID=811e5c fb-8f2f-43fd-bbed-f113829bdece ro crashkernel=auto 3b7ded8a:cad6a06c

/etc/fstab reads:

UUID=811e5cfb-8f2f-43fd-bbed-f113829bdece /                       ext4    defaults        1 1
UUID=e3256ba4-bb21-4396-97dd-8d4b7e0d95fc /boot                   ext4    defaults        1 2

The 3193... is the uuid of the raid1 for root disk.
The 49e9... is the uuid of the raid1 for the bot disk.
The 811e... is the uuid for the ext4 filesystem on root.
The e325... is the uuid for the ext4 filesystem on boot

So everything looks configured correctly… but it just hangs when one of the disks is missing, having trouble finding the 811e... (root) filesystem!

Could it be the raid disk isn’t coming up in degraded mode when one disk is missing, and so is hanging? Or could something else be involved?

Any ideas welcome!

OK, I think I’ve solved it. It’s a problem that goes back to RH7, apparently…

See 1701504 – Reduce dracut timeout (rd.retry) for discussion.

The default timeout appears to be around 3 minutes. So if I waited long enough then it does finally boot.

The time can be reduced by setting rd.retry=30 in /etc/sysconfig/grub and rebuilding grub.cfg.