Recent Kernel update (kernel-4.18.0-513.24.1) hangs after kernel selection boot

Greetings.

We have a VM running Rocky Linux 8 and just did a dnf update to get the latest security updates.

Upon rebooting the system, VM seemed to be unresponsive. On further investigation, looking at the VM console, I can see that it gets to the Kernel boot screen, but allowing it to go with the latest kernel, the screen would blank out and then be completely unresponsive. I don’t even see the kernel post messages you would see when you start the kernel.

The VM is setup as a standard command line while hosting web and a web app on there. However, the web service or the app running should prevent the system from being accessible via the VM Console or outside.

I was able to get it to run on kernel-4.18.0-513.11.1.el8_9.x86_64, just not kernel-4.18.0.513.24.1.el.8_9 nor 4.18.0.513.11.1.el8_9.0.1)

Looking into messages, I see no real indicators or issues or know what to look for as to the hanging black screen other than the fact the VM was unresponsive in general.

Any suggestions as to what the issue maybe? Again, not running any desktop on the box, so there is no graphics driver issue, system is unresponsive letting it sit there for 10+ minutes to remote connections as well as the local VM console.

I think you are saying that if you boot into an older kernel it starts working perfectly, so are you able to see any messages “at all” from the previous failed boot?

Check also the exact logs from the dnf upgrade from when the new kernel was being installed.

It may be that after each new kernel you have to run:
grub2-mkconfig -o /boot/grub2/grub.cfg
to populate any special kernel parameters that you have added to /etc/default/grub before you reboot to the new one. Now that you are past that install you may need to use grubby to populate all the kernels parameters.
I am using rocky 9 now and don’t know off hand what steps I took when running rocky 8.

I can’t see anything noticeable from the times the previous boots failing.

As a note, it shows the kernel list. Booting to the new kernel or the previous one before that, I see the screen blank out, but no blinking cursor nor any of the kernel boot process.

Looking at the DNF logs, I don’t see any issues around the time of the update and subsequent entries don’t show any issues except until today now where I see:

2024-04-09T01:29:56-0700 DEBUG error: Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer] (hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml).
2024-04-09T01:29:56-0700 WARNING Errors during downloading metadata for repository ‘appstream’:

  • Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer]
    2024-04-09T01:29:56-0700 DDEBUG Cleaning up.
    2024-04-09T01:29:56-0700 DDEBUG Plugins were unloaded.
    2024-04-09T01:29:56-0700 SUBDEBUG
    Traceback (most recent call last):
    File “/usr/lib/python3.6/site-packages/dnf/repo.py”, line 573, in load
    ret = self._repo.load()
    File “/usr/lib64/python3.6/site-packages/libdnf/repo.py”, line 397, in load
    return _repo.Repo_load(self)
    libdnf._error.Error: Failed to download metadata for repo ‘appstream’: Cannot download repomd.xml: Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer]

(Changed the HTTP to hxxp due to forum exception and me not trying to actually make links)

Although at the end of dnf log, the errors there have stopped.

Note - This is just doing a dnf update, not upgrade or reinstall. I have not made any changes with grub in anyway and we have been running this box for at least 2 years under Rocky and this was an upgrade from CentOS 7 to Rocky 8.

Check all kernel entries in case one of them looks wrong. As root
grubby --info=ALL

Sadly, I am not sure if it is wrong.

What parameters are on the args= lines of the grubby --info=ALL output (in addition to ro resume=...)?

If there is $kernelopts, then what does kernelopts= have in output of grub2-editenv list ?

For args=

args=“ro crashkernel=128M rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet $tuned_params”

That is the same across the board except for index 4, which appears to be the rescue kernel. Index 2 is the one working at the moment.

As for kernelopts, I see this under the working environment:

kernelopts=root=/dev/mapper/centos-root ro crashkernel=128M rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet

Remove these. Then you should see more output during boot.


So here is the snapshot I took when trying to boot the affected kernel(s). This is where it hangs. I don’t know if this helps. Given this is a production machine, I can’t let it sit too long. With the working one, it blows past too fast for me to capture the info.

If I have to, I can try and do it again, (Currently using VMWare Remote Console), but not sure I can capture the output.

It is typical that a lot will happen after actual error happens (but before the halt caused by that error finally occurs). That is why the “where it hangs” does not seem to tell why.

And you probably have the same issue with the problematic one; way too much irrelevant text.

So no real way to fix the issue? Or perhaps try and reinstall the kernel?

I just want to add that I’m experiencing different problems than the OP, but with specifically the same kernel in Rocky 8. In my case, utilities provided by open-vm-tools-desktop inside VMWare workstation have broken, and the Firewalld service will not start.

For firewalld, try as root
systemctl --no-pager -l status firewalld