Shutdown gets into an infinite loop Rocky Linux 9.1

When I run shutdown, I run into the following infinite loop:

watchdog: watchdog0: watchdog did not stop!
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
blkdev_get_no_open: 277 callbacks suppressed
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
blkdev_get_no_open: 280 callbacks suppressed

… It repeats itself forever…

Please Help! :frowning:

Looks like RedHat backported a breaking change from 5.19:

* Tue Jul 12 2022 Patrick Talbert <ptalbert@redhat.com> [5.14.0-128.el9]
...
- block: freeze the queue earlier in del_gendisk (Ming Lei) [2094256]
...

https://lkml.org/lkml/2022/8/31/582

Looks like there is a revert commit, but that hasn’t been pulled into the RedHat (and therefore Rocky) 5.14 kernel yet.

https://lkml.org/lkml/2022/9/26/108

What exact kernel version are you running when you see the shutdown problem?

Just checked the latest CentOS Stream 9 kernel 5.14.0-237.el9. The revert commit is not there. Someone needs to report this issue to Red Hat.

In our use case, we were able to consistently reproduce the hang whenever there was an active array that wasn’t auto-assembled during startup. If we stopped all but the boot and OS arrays before reboot, it’d restart just fine. If we didn’t stop the extra arrays, it’d hang.

I don’t know if there’s some magic incantation for /etc/mdadm.conf that will work around this bug, but we were able to mitigate by adding a script unit to shutdown that unwinds the md/lvm/filesystem stack.

I’m curious to know if our situation is similar to others experiencing this issue. If so it might be easier to patch the shutdown process in Rocky than to wait on RH to backport a reversion.

Another workaround will be to use ELRepo’s kernel-ml.

1 Like

I now have a 9.1 kernel set that was built with the revert commit. They are available here:

https://toracat.org/test/kernel/4c66a32/

Please test if you can. Note that they are not signed and are offered for testing purposes only.

I have exactly the same error lines and shutdown problems on my Rocky 9.1 workstation, running kernel-5.14.0-162.6.1.el9_1.0.1.x86_64.
I use disks in Raid1 using mdadm.
The machine just doesn’t shut down when using the Gnome button “shutdown” in the right top-hand corner of the screen, or as root “shutdown -h 0”.
I need to manually push and hold the power button of the machine. Not a good way and easy to forget…
I expect many users have this issue?
After the fresh install, it did shutdown properly, if I remember correctly. Then I installed several packages I need. No idea if and which one could be part of the problem…

An AlmaLinux user who was hit by the same issue tested with the patched kernel and reported back with a successful result:

https://bugs.almalinux.org/view.php?id=359

1 Like

Perhaps having the shutdown scripts do this would be good practice even without this bug?

Is this fix also already included in the current Rocky Linux 9.1 kernel which is 5.14.0-162.23.1.el9_1.x86_64?

The fix is in the 9.2 GA kernel 5.14.0-284.11.1.el9_2. So you’d need to update your kernel to 9.2.

2 Likes