We have a VM running Rocky Linux 8 and just did a dnf update to get the latest security updates.
Upon rebooting the system, VM seemed to be unresponsive. On further investigation, looking at the VM console, I can see that it gets to the Kernel boot screen, but allowing it to go with the latest kernel, the screen would blank out and then be completely unresponsive. I don’t even see the kernel post messages you would see when you start the kernel.
The VM is setup as a standard command line while hosting web and a web app on there. However, the web service or the app running should prevent the system from being accessible via the VM Console or outside.
I was able to get it to run on kernel-4.18.0-513.11.1.el8_9.x86_64, just not kernel-4.18.0.513.24.1.el.8_9 nor 4.18.0.513.11.1.el8_9.0.1)
Looking into messages, I see no real indicators or issues or know what to look for as to the hanging black screen other than the fact the VM was unresponsive in general.
Any suggestions as to what the issue maybe? Again, not running any desktop on the box, so there is no graphics driver issue, system is unresponsive letting it sit there for 10+ minutes to remote connections as well as the local VM console.
I think you are saying that if you boot into an older kernel it starts working perfectly, so are you able to see any messages “at all” from the previous failed boot?
Check also the exact logs from the dnf upgrade from when the new kernel was being installed.
It may be that after each new kernel you have to run:
grub2-mkconfig -o /boot/grub2/grub.cfg
to populate any special kernel parameters that you have added to /etc/default/grub before you reboot to the new one. Now that you are past that install you may need to use grubby to populate all the kernels parameters.
I am using rocky 9 now and don’t know off hand what steps I took when running rocky 8.
I can’t see anything noticeable from the times the previous boots failing.
As a note, it shows the kernel list. Booting to the new kernel or the previous one before that, I see the screen blank out, but no blinking cursor nor any of the kernel boot process.
Looking at the DNF logs, I don’t see any issues around the time of the update and subsequent entries don’t show any issues except until today now where I see:
2024-04-09T01:29:56-0700 DEBUG error: Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer] (hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml).
2024-04-09T01:29:56-0700 WARNING Errors during downloading metadata for repository ‘appstream’:
Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer]
2024-04-09T01:29:56-0700 DDEBUG Cleaning up.
2024-04-09T01:29:56-0700 DDEBUG Plugins were unloaded.
2024-04-09T01:29:56-0700 SUBDEBUG
Traceback (most recent call last):
File “/usr/lib/python3.6/site-packages/dnf/repo.py”, line 573, in load
ret = self._repo.load()
File “/usr/lib64/python3.6/site-packages/libdnf/repo.py”, line 397, in load
return _repo.Repo_load(self)
libdnf._error.Error: Failed to download metadata for repo ‘appstream’: Cannot download repomd.xml: Curl error (56): Failure when receiving data from the peer for hxxp://mirrors.egr.msu.edu/rockylinux/8.9/AppStream/x86_64/os/repodata/repomd.xml [Recv failure: Connection reset by peer]
(Changed the HTTP to hxxp due to forum exception and me not trying to actually make links)
Although at the end of dnf log, the errors there have stopped.
Note - This is just doing a dnf update, not upgrade or reinstall. I have not made any changes with grub in anyway and we have been running this box for at least 2 years under Rocky and this was an upgrade from CentOS 7 to Rocky 8.
So here is the snapshot I took when trying to boot the affected kernel(s). This is where it hangs. I don’t know if this helps. Given this is a production machine, I can’t let it sit too long. With the working one, it blows past too fast for me to capture the info.
If I have to, I can try and do it again, (Currently using VMWare Remote Console), but not sure I can capture the output.
It is typical that a lot will happen after actual error happens (but before the halt caused by that error finally occurs). That is why the “where it hangs” does not seem to tell why.
And you probably have the same issue with the problematic one; way too much irrelevant text.
I just want to add that I’m experiencing different problems than the OP, but with specifically the same kernel in Rocky 8. In my case, utilities provided by open-vm-tools-desktop inside VMWare workstation have broken, and the Firewalld service will not start.