Dnf failure while installing kernel-core 5.14.0-427

I see the following when installing (or re-installing) the latest kernel update:

Running scriptlet: kernel-core-5.14.0-427.18.1.el9_4.x86_64                                                                                                                                                                                                           15/18
/etc/default/grub: line 62: terminal_output: command not found
insmod: ERROR: could not load module increment: No such file or directory
insmod: ERROR: could not load module part_gpt: No such file or directory
insmod: ERROR: could not load module xfs: No such file or directory
/etc/default/grub: line 104: search: command not found
insmod: ERROR: could not load module part_gpt: No such file or directory
insmod: ERROR: could not load module fat: No such file or directory
/etc/default/grub: line 107: search: command not found
insmod: ERROR: could not load module blscfg: No such file or directory
/etc/default/grub: line 129: blscfg: command not found
/etc/default/grub: line 148: save_env: command not found
/etc/default/grub: line 197: syntax error near unexpected token `}'
/etc/default/grub: line 197: `  }'
/etc/default/grub: line 62: terminal_output: command not found
insmod: ERROR: could not load module increment: No such file or directory
insmod: ERROR: could not load module part_gpt: No such file or directory
insmod: ERROR: could not load module xfs: No such file or directory
/etc/default/grub: line 104: search: command not found
insmod: ERROR: could not load module part_gpt: No such file or directory
insmod: ERROR: could not load module fat: No such file or directory
/etc/default/grub: line 107: search: command not found
insmod: ERROR: could not load module blscfg: No such file or directory
/etc/default/grub: line 129: blscfg: command not found
/etc/default/grub: line 148: save_env: command not found
/etc/default/grub: line 197: syntax error near unexpected token `}'
/etc/default/grub: line 197: `  }'

Based on evidence in /var/log/dnf.rpm.log this has been a problem for me going back to 9.2, but I’m just now noticing.

When I run grub2-mkconfig I get the following error:

# grub2-mkconfig
/etc/default/grub: line 62: terminal_output: command not found

This appears to be due to “corruption” in /etc/default/grub which, for me, appears to have been overwritten at some point with /boot/grub2/grub.cfg as those two files are nearly identical for me.

A couple questions:

  1. What should be in /etc/default/grub?

  2. What is the recommended procedure to fully recover from this situation?

I can see that the updated kernels for RL 9.3 and 9.4 are installed, but the grub boot menu only offers the 9.2 option … and allows me to boot successfully. After boot:

# uname -a
Linux blackrock 5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 17:09:15 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

dnf reports these installed kernels:

# dnf list kernel
Last metadata expiration check: 2:05:00 ago on Wed 29 May 2024 11:01:36 AM EDT.
Installed Packages
kernel.x86_64         5.14.0-284.11.1.el9_2     @baseos
kernel.x86_64         5.14.0-427.16.1.el9_4     @baseos
kernel.x86_64         5.14.0-427.18.1.el9_4     @baseos

/etc/default/grub shouldn’t be the same as the boot one. Usually /boot/grub2/grub.cfg is the same as /etc/grub2.cfg since one is a symlink.

[root@rocky9 ~]# file /etc/grub2.cfg 
/etc/grub2.cfg: symbolic link to ../boot/grub2/grub.cfg
[root@rocky9 ~]# file /etc/grub2-efi.cfg 
/etc/grub2-efi.cfg: symbolic link to ../boot/grub2/grub.cfg

The contents of your /etc/default/grub, should be similar to this:

[root@rocky9 ~]# cat /etc/default/grub 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX=""
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

Once you’ve fixed the content with the above, I would probably do:

dnf reinstall kernel

I restored my /etc/default/grub to something similar to your recommendation (different timeout and did not disable recovery), then reinstalled all the kernel components:

# dnf reinstall kernel*

This does seem to have resolved the issue and allowed me to boot into RL 9.4 with the new kernel:

$ uname -sr
Linux 5.14.0-427.18.1.el9_4.x86_64

I’m very curious how I got into this situation, but hopefully this will help someone else who has a similar issue.

1 Like

Recently what I’ve had creating broken kernel installs is Nvidia’s DNF plugin.

It would be good to know the answer.

Did you watch the whole dnf update line by line as it was happening, checking for errors as you go? It’s possible it didn’t complete properly at some point in the past.

I doubt a dnf update did this, else others would have experienced this by now. I expect checking bash history could show some enlightenment. But I have a feeling the grub.cfg was copied over the /etc/default/grub file by mistake. It’s about the only possibility I can think of.

Then again, weirder things have happened.

Yes, I agree it’s “weird” and I don’t completely discount the idea that I shot myself in the foot. But … since /etc/default/grub is not an exact copy of /boot/grub2/grub.cfg (one has a timeout of 6 seconds, and the other has 5 seconds) it doesn’t really smell like something I would have done consciously.

Based on the dnf.rpm.log this is a problem that appears to have begun over a year ago, it seems exceedingly unlikely that bash history will provide any enlightenment. I’m just a bit surprised with myself that it took me this long to notice it – which leads me to conclude that even if others HAVE had this problem, they might not have noticed … or reported it (which seems more likely to be true).

I’ll dig into the dnf log later tonight. If there’s any gold in there, I’ll report back.

1 Like