I have a new machine that I’m trying to get set up with a Rocky 9.5 server install. The install itself goes fine, I’m able to boot the machine and log into it without issue. When attempt to install packages via DNF I’m running into trouble. If I install more than a single package at a time (including dependencies) I get a hang during installation. At this point any SSH connections I have active to the device go inactive and the device stops responding to pings. The local session does not respond to any input and the display stops updating.
This happens when I try to install a variety of packages. The dnf.log and the dnf.rpm.log arent showing me anything thats giving me a clear picture of whats going wrong
dnf.log
2025-03-18T10:59:48-0400 INFO Installed size: 96 M
2025-03-18T10:59:49-0400 INFO Downloading Packages:
2025-03-18T11:00:04-0400 INFO ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2025-03-18T11:00:04-0400 INFO Total 2.6 MB/s | 37 MB 00:14
2025-03-18T11:00:04-0400 DEBUG Using rpmkeys executable at /usr/bin/rpmkeys to verify signatures
2025-03-18T11:00:04-0400 INFO Running transaction check
2025-03-18T11:00:04-0400 INFO Transaction check succeeded.
2025-03-18T11:00:04-0400 INFO Running transaction test
2025-03-18T11:00:04-0400 INFO Transaction test succeeded.
2025-03-18T11:00:04-0400 DDEBUG timer: transaction test: 444 ms
2025-03-18T11:00:04-0400 INFO Running transaction
2025-03-18T11:00:04-0400 DEBUG RPMDB altered outside of DNF.
2025-03-18T11:00:04-0400 DDEBUG RPM transaction start.
dnf.rpm.log
2025-03-18T10:57:38-0400 INFO --- logging initialized ---
2025-03-18T10:59:47-0400 INFO --- logging initialized ---
2025-03-18T11:00:05-0400 SUBDEBUG Installed: kernel-headers-5.14.0-503.31.1.el9_5.x86_64
I doubt its a space issue - I’ve done the same install process with rocky 9.4 on much smaller drives and did not run into this issue.
The machine in question is fairly new, its got a Intel Xeon E-2434 and a lot more RAM than 2 GB so it should specs should be fine unless theres some strange CPU compatibility issue Im not aware of. I’ll dig into those logs and see what I can find
As hinted above I suggest you open two terminals as user root.
In the first terminal you constantly monitor new message from
the kernel with command dmesg -w -T
In the second terminal you can try to run a dnf with a large set of packages (or try to download a larger file from an http(s) source with a command line tool like curl. )
The behaviour you describe normally results in new output shown by the dmesg command given above, e.g. information about network interfaces going down.
Gave it another try, this time did a dnf clean all followed by dnf install gcc. It hung installing the first package which was kernel-headers-5.14.0-503.31.1.el9_5.x86_64
Here’s the last few lines of output on dmesg -w -T
It does consistently get through downloading the packages, it only ever hangs during installation. The only major difference between this and a base install using the “Server” option in the installer is that there is a separate lv for /opt as I’m planning to install an application that stores a lot of data there
Don’t really see anything in the log files whatsoever. Suggest opening multiple sessions, one running top, one running iotop and see what is happening when the commands are ran in another terminal.
So far, impossible to diagnose it. I suppose you could run memtest on the server to see if you have defective hardware. Maybe even do firmware updates to the server.
When the display stops updating, does that include the mouse pointer freezing?
It almost sounds like hardware freeze, meaning logs won’t show anything.
I can imagine some %post script in the kernel rpms trying to run something that causes the freeze. Maybe try installing a package that does not try to update kernel packages, does it make any difference?
I assume you are able to force reboot somehow, and if that’s the case, I’d check logs from the very end of the last boot, scroll backwards to find the last ever entries before the freeze.
memtest86 reported no issues. It does seem to be a hardware freeze - the local monitor plugged into the device stops updating when the dnf hang occurs.
looking at /var/log/messages for the last attempt isnt giving me anything thats obvious to me as a smoking gun
Mar 20 10:51:20 rocky9 rsyslogd[1183]: cannot resolve hostname 'logcollector': Inappropriate ioctl for device [v8.2310.0-4.el9 try https://www.rsyslog.com/e/2027 ]
Mar 20 10:51:22 rocky9 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 20 10:51:56 rocky9 systemd[1]: Starting Fingerprint Authentication Daemon...
Mar 20 10:51:56 rocky9 systemd[1]: Started Fingerprint Authentication Daemon.
Mar 20 10:51:57 rocky9 rsyslogd[1183]: cannot resolve hostname 'logcollector' [v8.2310.0-4.el9 try https://www.rsyslog.com/e/2027 ]
Mar 20 10:51:59 rocky9 chronyd[831]: Selected source 108.61.73.243 (2.rocky.pool.ntp.org)
Mar 20 10:52:02 rocky9 auditd[776]: Error receiving audit netlink packet (No buffer space available)
Mar 20 10:55:00 rocky9 kernel: Linux version 5.14.0-503.31.1.el9_5.x86_64 (mockbuild@iad1-prod-build001.bld.equ.rockylinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.35.2-54.el9) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 16:53:43 UTC 2025
The last line above is the first for the boot after manually power cycling the machine.
I am able to install other packages, for example if I do dnf install samba it and its prerequisites install as normal. When I do attempt to install a package that causes a hang the machine is able to boot normally afterwards with no manual intervention.
I did try performing a similar rocky9.5 install on some spare hardware using the same parameters and media and had no issues there. Its a different CPU family and chipset so not the greatest test but it at least confirms its not bad packages on the install media I’m using.
You could see if it works any better with Rocky 8.10 instead, since that is a 4.x kernel. Another alternative is adding elrepo repository and using a kernel-ml or kernel-lt package provided by them which would be a 6.x kernel. If you go that route, do it on Rocky 9 with the elrepo kernel before a clean install of Rocky 8.
At least that way you get to compare between 4.x, 5.x and 6.x kernels to see if it makes a difference.
That’s interesting, I was guesing regarding kernel packages triggering the issue.
I can see you were careful to get the exact lines. The error about auditd and the netlink packet seems to appear in your earlier testing as well, I wonder what it means, but I don’t think it’s the exact cause of the freeze.