Shutdown gets into an infinite loop Rocky Linux 9.1

bradcliu · December 1, 2022, 7:14pm

When I run shutdown, I run into the following infinite loop:

watchdog: watchdog0: watchdog did not stop!
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
blkdev_get_no_open: 277 callbacks suppressed
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
block device autoloading is deprecated and will be removed.
blkdev_get_no_open: 280 callbacks suppressed

… It repeats itself forever…

Please Help!

jshep · January 16, 2023, 5:37pm

Looks like RedHat backported a breaking change from 5.19:

* Tue Jul 12 2022 Patrick Talbert <ptalbert@redhat.com> [5.14.0-128.el9]
...
- block: freeze the queue earlier in del_gendisk (Ming Lei) [2094256]
...

https://lkml.org/lkml/2022/8/31/582

Looks like there is a revert commit, but that hasn’t been pulled into the RedHat (and therefore Rocky) 5.14 kernel yet.

https://lkml.org/lkml/2022/9/26/108

gerry666uk · January 16, 2023, 8:16pm

What exact kernel version are you running when you see the shutdown problem?

toracat · January 18, 2023, 8:24am

Just checked the latest CentOS Stream 9 kernel 5.14.0-237.el9. The revert commit is not there. Someone needs to report this issue to Red Hat.

jshep · January 19, 2023, 4:02pm

In our use case, we were able to consistently reproduce the hang whenever there was an active array that wasn’t auto-assembled during startup. If we stopped all but the boot and OS arrays before reboot, it’d restart just fine. If we didn’t stop the extra arrays, it’d hang.

I don’t know if there’s some magic incantation for /etc/mdadm.conf that will work around this bug, but we were able to mitigate by adding a script unit to shutdown that unwinds the md/lvm/filesystem stack.

I’m curious to know if our situation is similar to others experiencing this issue. If so it might be easier to patch the shutdown process in Rocky than to wait on RH to backport a reversion.

toracat · January 19, 2023, 6:12pm

Another workaround will be to use ELRepo’s kernel-ml.

toracat · January 23, 2023, 10:45am

I now have a 9.1 kernel set that was built with the revert commit. They are available here:

https://toracat.org/test/kernel/4c66a32/

Please test if you can. Note that they are not signed and are offered for testing purposes only.

hartings · January 25, 2023, 11:47am

I have exactly the same error lines and shutdown problems on my Rocky 9.1 workstation, running kernel-5.14.0-162.6.1.el9_1.0.1.x86_64.
I use disks in Raid1 using mdadm.
The machine just doesn’t shut down when using the Gnome button “shutdown” in the right top-hand corner of the screen, or as root “shutdown -h 0”.
I need to manually push and hold the power button of the machine. Not a good way and easy to forget…
I expect many users have this issue?
After the fresh install, it did shutdown properly, if I remember correctly. Then I installed several packages I need. No idea if and which one could be part of the problem…

toracat · January 30, 2023, 8:26pm

An AlmaLinux user who was hit by the same issue tested with the patched kernel and reported back with a successful result:

https://bugs.almalinux.org/view.php?id=359

lmann · May 15, 2023, 7:34am

Perhaps having the shutdown scripts do this would be good practice even without this bug?

Cphusion · May 15, 2023, 1:21pm

Is this fix also already included in the current Rocky Linux 9.1 kernel which is 5.14.0-162.23.1.el9_1.x86_64?

toracat · May 15, 2023, 7:28pm

The fix is in the 9.2 GA kernel 5.14.0-284.11.1.el9_2. So you’d need to update your kernel to 9.2.