Upgrade to 8.7 fails to boot with ZFS installed

After upgrading to 8.7 with zfs-kmod installed the kernel fails to boot (even to single user) with messages like:
BUG: soft lockup - CPU#0 stuck for 22s!
and
udev wait for Complete Device Initialization

Booting from an earlier kernel works and changing the dkms version of ZFS also works.

See also kmod-zfs fails to load on Rocky Linux 8.7 kernel · Issue #14189 · openzfs/zfs · GitHub

I have a couple of thoughts.

  1. I’ve upgraded to 8.7 with ZFS on a few systems and all is good so far. You don’t say what version of ZFS you have.
  2. Knowing that the old kernel works is a great first step. Someone might want to check Red Hat’s Bugzilla to see if others are having soft lockup errors with this kernel.
  3. In the past when I’ve had soft lockup errors (that wasn’t actually a bad processor) it was because I needed a firmware update to match the microcode update. This is slightly less likely since you were able to boot into an old kernel just fine, but it might be worth looking into.

Its the latest version of ZFS: kmod-zfs-2.1.6-1. Are you using the 4.18.0-425.3.1 kernel?

$ cat /etc/rocky-release
Rocky Linux release 8.7 (Green Obsidian)
$ rpm -qa | grep -i zfs
zfs-dkms-2.1.6-1.el8.noarch
libzfs5-2.1.6-1.el8.x86_64
zfs-release-2-2.el8.noarch
zfs-2.1.6-1.el8.x86_64
$ uname -r
4.18.0-425.3.1.el8.x86_64

Hopefully this helps.

So it does seem that RH has some open bugs about “BUG: soft lockup” but none of them are with ZFS. Do you happen to have kernel panic logs to dig through? Maybe there’s something more in them that point to a cause?

How does the kmod-zfs operate? Does it have pre-compiled kernel module for (some version of) kernel, or does it build the kernel module on your machine for your kernel?

For comparison, ELRepo does pre-compile kmods and in a way that they function with all kernels of a point update.
That is, their “el8_6” modules work with all 4.18.0-372.*.el8 (of 8.6), while the 4.18.0-425.*.el8 (of 8.7) require a rebuild.

Furthermore for the locally built (dkms?) modules – since we are talking storage – the module has to be built for 4.18.0-425.*.el8 while running a 4.18.0-372.*.el8 kernel, and injected into initramfs image of the 4.18.0-425.*.el8.

If the current kmod-zfs is prebuilt (for 8.6) by zfs repo maintainers, then you have to wait for them to build new kmod-zfs for 8.7.

Or so I understand the situation.

Ah - you are using zfs with dkms that works OK.

I don’t think “BUG: soft lockup” is the problem, rather its a side effect. I think the problem is that UDEV is hanging when it discovers a ZFS formatted partition and tries to load the zfs module.

I think you are right. We need to wait for the kmod-zfs to be updated for 8.7.

I wanted to highlight the cause of the hang and to document some work-arounds. Because it’s not immediately obvious that it has anything to do with ZFS.

I hadn’t noticed, but this problem has been fixed now. Though you will probably have to remove and re-install zfs.