After upgrading to 8.7 with zfs-kmod installed the kernel fails to boot (even to single user) with messages like:
BUG: soft lockup - CPU#0 stuck for 22s!
and
udev wait for Complete Device Initialization
Booting from an earlier kernel works and changing the dkms version of ZFS also works.
I’ve upgraded to 8.7 with ZFS on a few systems and all is good so far. You don’t say what version of ZFS you have.
Knowing that the old kernel works is a great first step. Someone might want to check Red Hat’s Bugzilla to see if others are having soft lockup errors with this kernel.
In the past when I’ve had soft lockup errors (that wasn’t actually a bad processor) it was because I needed a firmware update to match the microcode update. This is slightly less likely since you were able to boot into an old kernel just fine, but it might be worth looking into.
So it does seem that RH has some open bugs about “BUG: soft lockup” but none of them are with ZFS. Do you happen to have kernel panic logs to dig through? Maybe there’s something more in them that point to a cause?
How does the kmod-zfs operate? Does it have pre-compiled kernel module for (some version of) kernel, or does it build the kernel module on your machine for your kernel?
For comparison, ELRepo does pre-compile kmods and in a way that they function with all kernels of a point update.
That is, their “el8_6” modules work with all 4.18.0-372.*.el8 (of 8.6), while the 4.18.0-425.*.el8 (of 8.7) require a rebuild.
Furthermore for the locally built (dkms?) modules – since we are talking storage – the module has to be built for 4.18.0-425.*.el8 while running a 4.18.0-372.*.el8 kernel, and injected into initramfs image of the 4.18.0-425.*.el8.
If the current kmod-zfs is prebuilt (for 8.6) by zfs repo maintainers, then you have to wait for them to build new kmod-zfs for 8.7.
I don’t think “BUG: soft lockup” is the problem, rather its a side effect. I think the problem is that UDEV is hanging when it discovers a ZFS formatted partition and tries to load the zfs module.
I think you are right. We need to wait for the kmod-zfs to be updated for 8.7.
I wanted to highlight the cause of the hang and to document some work-arounds. Because it’s not immediately obvious that it has anything to do with ZFS.