Dnf upgrade from 9.0 to 9.1 failed

Before dnf upgrade

5.14.0-70.17.1.el9_0.x86_64

During dnf upgrade

2022-12-15T11:54:14+0000 INFO dracut: Can't write to /boot/efi/70135c696ca547939f1864d32e1cbf74/5.14.0-162.6.1.el9_1.0.1.x86_64: Directory /boot/efi/70135c696ca547939f1864d32e1cbf74/5.14.0-162.6.1.el9_1.0.1.x86_64 does not exist or is not accessible.

After dnf upgrade:
The new kernel for 9.1 has been installed, but is missing from the grub menu, so it currently boots into the previous kernel from 9.0.
It looks like all packages for 9.1 have been installed, but I canā€™t boot into the new kernelā€¦

1 Like

Why did dracut try to write such non-existing location?

The EFI System Partition (ESP) ā€“ a VFAT filesystem ā€“ is mounted to /boot/efi.
I have never seen anything but directory EFI in /boot/efi/.

The dracut was probably called in ā€˜postinstallā€™ script of some kernel* package.
The rest of the script might have been skipped due to what dracut returns.
That could explain the lack of boot entry.

A trivial thing is to remove&install (or reinstall) the 5.14.0-162* kernel packages.
That would (re)create boot entry, if successful.

However, if the dracut issue persists, then you have to diagnose&fix it first.

The location (it was trying to write to) didnā€™t look right to me either, but at the top of ā€˜man dracutā€™, it implies that this could be a valid location, based on the boot loader you are using, but all my previous initramfs files were directly under ā€˜/bootā€™, so I donā€™t know.

Removing all the 9.1 kernel packages and then running ā€˜dnf upgradeā€™ again, seems to have fixed everything, but I donā€™t know what the original problem was.

Under ā€˜/bootā€™ I now have:

config-5.14.0-162.6.1.el9_1.0.1.x86_64
initramfs-5.14.0-162.6.1.el9_1.0.1.x86_64.img
initramfs-5.14.0-162.6.1.el9_1.0.1.x86_64kdump.img
symvers-5.14.0-162.6.1.el9_1.0.1.x86_64.gz
System.map-5.14.0-162.6.1.el9_1.0.1.x86_64
vmlinuz-5.14.0-162.6.1.el9_1.0.1.x86_64

Two points related to bootloader (GRUB) config and hence its menu:

  • The default is to use BLS (Boot Loader Specification). That is, each menu entry is a separate file in directory /boot/loader/entries/ ā€“ not in the grub.cfg. (One can disable BLS to get back to the ā€œall in one fileā€ config.)
  • In el9 ā€“ UEFI system ā€“ the EFI loads GRUB binary from /boot/efi/EFI/${vendor}/ and grub.cfg from same location. The difference to el8 is that that grub.cfg is really short and essentially has one statement:
    include /boot/grub2/grub.cfg
    The ā€œreal grub.cfgā€ is thus the /boot/grub2/grub.cfg, just like in legacy systems. Should you have a need to run grub2-mkconfig, remember to direct the config to /boot/grub2/grub.cfg

In my case, I can see that ā€˜/boot/grub2/grub.cfgā€™ has not been changed since August, but ā€˜/boot/grub2/grubenvā€™ has been updated today, when I ran dnf upgrade. I donā€™t know if it updates grub before or after creating initramfs, but I gess it should leave grub until last.

How about files in /boot/loader/entries/ ?

Boot loader entries:

ls /boot/loader/entries/
70135c696ca547939f1864d32e1cbf74-0-rescue.conf
70135c696ca547939f1864d32e1cbf74-5.14.0-162.6.1.el9_1.0.1.x86_64.conf
70135c696ca547939f1864d32e1cbf74-5.14.0-70.13.1.el9_0.x86_64.conf
70135c696ca547939f1864d32e1cbf74-5.14.0-70.17.1.el9_0.x86_64.conf

At the time of the original post, the 9.1 entry was not there, I can see it was added today after running dnf upgrade.

Those are your menu entries.

Just to be absolutely sure, I have on Ansible ā€œupdateā€ play two steps:

  1. update everything, except kernel&co
  2. update everything

That way all the packages (except kernel modules) that could go to initramfs or are otherwise involved in installation of kernel are definitely already updated when the new kernel is installed on step two.

Not sure whether that has any real effect or benefit other than the false sense of being in control.

Of all the different %post scripts, and in all the different kernel packages, do you know which one actually invokes dracut to create the ā€˜/bootā€™ initramfs files?

One can check:

$ rpm -qa kernel\* | while read P ; do echo "### $P ###" ; rpm -q --scripts ${P} | grep -m1 dracut ; done
### kernel-srpm-macros-1.0-11.el9.noarch ###
### kernel-core-5.14.0-70.26.1.el9_0.x86_64 ###
### kernel-modules-5.14.0-70.26.1.el9_0.x86_64 ###
### kernel-5.14.0-70.26.1.el9_0.x86_64 ###
### kernel-devel-5.14.0-70.26.1.el9_0.x86_64 ###
### kernel-core-5.14.0-70.30.1.el9_0.x86_64 ###
### kernel-modules-5.14.0-70.30.1.el9_0.x86_64 ###
### kernel-5.14.0-70.30.1.el9_0.x86_64 ###
### kernel-devel-5.14.0-70.30.1.el9_0.x86_64 ###
### kernel-core-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-modules-5.14.0-162.6.1.el9_1.x86_64 ###
	touch /var/lib/rpm-state/kernel/need_to_run_dracut_5.14.0-162.6.1.el9_1.x86_64
### kernel-tools-libs-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-tools-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-headers-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-devel-5.14.0-162.6.1.el9_1.x86_64 ###
### kernel-devel-matched-5.14.0-162.6.1.el9_1.x86_64 ###

Fascinating, only scripts of kernel-modules of el9_1 appear to contain word ā€œdracutā€; the el9_0 package did not. That is thus new feature.

However, Iā€™d guess that the /bin/kernel-install called by ā€˜kernel-coreā€™ might be involved.

Is ā€œ1. update everything, except kernel&coā€ equivalent to

# yum update --exclude=kernel*

There are couple packages so tightly coupled to the kernel that they have to be in the list too:

dnf up --exclude=vdo,kmod-kvdo,kernel*

or as I have it as Ansible tasks:

  - name: Update all, except kernel
    ansible.builtin.dnf:
      name: '*'
      state: latest
      exclude: 'vdo,kmod-kvdo,kernel*'

  - name: Update all, including kernel
    ansible.builtin.dnf:
      name: '*'
      state: latest
1 Like

No reboot between steps 1. and 2. Correct?

Yes.

Now that you do mention, upgrade of some libraries, e.g. glibc would warrant a reboot, but Iā€™m not that paranoid. See Identify packages that will require a system reboot after an update - Red Hat Customer Portal

As for the reboot later:

  - name: We will use 'needs-restarting'
    ansible.builtin.dnf:
      name: 'yum-utils'
      state: present

  - name: Check need for reboot
    check_mode:   false
    changed_when: false
    shell: "needs-restarting -r > /dev/null && echo false || echo true"
    register: needs_reboot

  - debug:
      msg: "A reboot is required"
    when: "'true' in needs_reboot.stdout"

  - name: Gather the package facts
    ansible.builtin.package_facts:
    tags:
    - reboot

  - block:
    - name: Get running VMs
      check_mode:   false
      changed_when: false
      command: virsh list --name
      register: virsh_list

    - block:
      - name: List running guests
        debug:
          var: virsh_list.stdout_lines
        when:
        - virsh_list.stdout_lines|length > 0

      - name: Reboot VM host that has no guests
        ansible.builtin.reboot:
        when:
        - "'true' in needs_reboot.stdout"
        - virsh_list.stdout_lines|length == 0

      when:
      - virsh_list.stdout_lines is defined
    when:
    - "'libvirt-client' in ansible_facts.packages"
    tags:
    - reboot

  - name: Reboot non-host
    ansible.builtin.reboot:
    when:
    - "'libvirt-client' not in ansible_facts.packages"
    - "'true' in needs_reboot.stdout"
    tags:
    - reboot

(But that could be better still.)

1 Like

Awesome! I think you just convinced me to start using Ansible.

Maybe it only tries to use ā€˜dracutā€™ when things go wrong (as a fallback). There must be some ā€œnormalā€ way that it has been creating initramfs entries for years, before this change. I did notice there was a long pause when it was running ā€˜kmod-kvdoā€™, and perhaps it ended up in a half-finished state, but I did run ā€˜dnf checkā€™ and it said everything was ok.