EL8 equivalent procedure for migrating nfsroot to localboot?

I maintained system images that were swapped back and forth between pxeboot/nfsroot and locally installed for many years (pre-RedHat redhat linux through RHEL6) but only recently (Rocky 8.6) tried migrating the procedure to EL8. I’ve run into a problem I’ve not seen before: all commands return file-not-found, including intr’s that shouldn’t be impacted by missing paths or devices.

PXELINUX 6.04 PXE Copyright (C) 1994-2015 H. Peter Anvin et al
No DEFAULT or UI configuration directive found!
boot: ls
Loading ls… failed: No such file or directory
boot: unset root
Loading unset… failed: No such file or directory

I’m pretty sure I’ve just neglected to include packages/files that are no longer bundled or 'depends-ed like they previously were, mainly because this has been the problem for 99% of all migration issues I’ve had since going to EL8 (payloads moving to new package names, etc). For example on the pxeboot side of things it’s normal to have /boot/loader/entries/ show up correctly, /boot/grub2 and /boot/efi/EFI/rocky be empty, and /boot/efi/EFI/BOOT be missing. So when transferring to local storage you’d add grub2-efi, grub2-efi-modules, shim-x64, and the appropriate grub2-efi-{stub}-modules (.noarch) at a minimum. But this still only provides the /boot/efi/EFI/BOOT/ bits and the /boot/efi/EFI/rocky/{grubx64,mmx64,shimx64,shimx64-rocky}.efi, the rest is still missing. It gets to the boot: prompt but can’t load for any commands, even “set”, so clearly I’m still missing major parts of the infrastructure. But copying from a system with the same packages and versions installed locally (plus the usual depmods, mkinitrds, grub2-mkconfigs, etc, to correct for the fact that the source booted pxe and the dest will boot efi), that doesn’t alter behavior. The stock docs aren’t particularly helpful beyond “does the referenced path actually exist” (they do). Any ideas what other content I’m missing?

Background:

The system(s) in question are primarily pxebooted with the old saw of a readonly nfsroot plus read/write nfs fs slices for /etc, /root, /var, and so on, for times when the system images (aggregate content in the server-side nfsroot’s and fs slices’ file trees), change often across many downstream provisioned clients. They also have local block storage for other production times when the system images are relatively static (making performance the greater priority). For reasons I’m not at liberty to disclose in public forum the physical systems’ SMBIOS needs to be set to only allow pxebooting from the first ether device. My go-to solution for several decades has been to just (scripted)…

  • pxeboot the target box and log in
  • set up the appropriate local block device with disklabel, partitions, and filesystems
  • temporarily mount all local logical devices under /mnt and rsync / to /mnt with the usual omissions (/dev, /sys, /proc, …)
  • bindmount the usual /proc, /dev, /sys, /run, /proc/pts into the tree under /mnt
  • chroot into /mnt, reinstall grub2-efi grub2-efi-modules shim-x64, and grub2-mkconfig into /boot/grub2/grub.cfg and /boot/efi/EFI/rocky/grub.cfg
  • exit the chroot, tear down the mounts, shut down the system
  • flip the corresponding pxeconfig from pointing to upstream tftp-provided kernel and initramfs to something like

set default=0
set timeout 10
menuentry “Boot from local disk” {
search --file --no-floppy --set=root /EFI/redhat/grubx64.efi
chainloader ($root)/EFI/redhat/grubx64.efi
}

  • hit the box’s BMC to power the system back on

Prior to EL8 the only complication was that, because bios firmware boot options were limited to a single pxe device, I couldn’t “localboot 0”, etc. But chaining worked fine, at least through the end of EL6 and maybe as recently as 7.4(?).

Oddly enough (go figure) part of the problem was a bad syslinux install on the upstream server. So a reinstall of that and a fallback to a minimal pxeconfig of

default vesamenu.c32
prompt 1
timeout 60
label Local
menu label Boot from ^Local drive
localboot 0xffff

improves things slightly – it now at least shows some sign of processing both old and new (menu/menuentry) syntax. Letting the timout run or hitting enter does produce the “Booting from local disk” message from ‘localboot’, so at least that appears to begin(?) loading. Still doesn’t display the menu itself though (including menu.c32). Still throws Loading {command}... failed: No such file or directory even for commands like ‘set’. Hangs indefinitely after the localboot “Booting from local disk”.