The Further Experiments of a Vacuous Experimenter

Was: “Trouble in River City – Rocky Linux 8.5 Rollover Blew Up”

A quick recap from Yesterday:

First I tried jlehtone’s suggestion of running history | grep nvidia – which worked like a charm. When I ran it it gave me two items 1) kmod-nvidia and 2) kmod-nvidia --allowerasing. I then erased kmod-nvidia and did a reinstall. While that did not solve the problem it got me to the point where it wanted to start the login page before it failed with a whole lot of error messaages that ran on and on, and it also gave me the warning that “Nvidia Kernel Module MISSING (!!!) Falling back to Nouveau”.

Second Bobar wrote and said. “You’re not alone. just ran dnf upgrade on RL 8.4 and it is now fubarred ( Had KDE 5.18 and there were Nvidia things in there … on which I gave up eventually - got an old radeon ). Luckily that was on a backup I had restored to a VM (I’m just not a very trusting person).”

Third, while discussing trying an install of Rocky Linux 8.5 and my 2 TB HDD with my buddy, he suggested that I put the DVD1.iso on a 1 TB HDD my sister no longer wanted and sent to me which is used more or less as a scratch drive, rather than on a thumb drive. On a sudden impulse I checked to see what was on there as I thought it was possible I may have put a BACKUP on that drive. I did!!! Unfortunately the BACKUP was of CentOS 8.4, not Rocky Linux 8.4. This gave me the bright idea of trying to “Back Door” my way into Rocky Linux 8.5 by 1) Erasing everything on my 2 TB HDD to have a clean drive for my experiments 2) Copy the BACKUP off the 1TB “scratch disk” to the clean 2 TB HDD 3) Update CentOS 8.4 and in the process roll it over to CentOS 8.5 4) With the idea of converting CentOS 8.5 ==> Rocky Linux 8.5

Fourth. Starting late last night having donned my large Navy Blue Bathrobe with Large Sleeves I started the process and Wiped Out everything on the 2 TB HDD and made it CLEAN! … Then fell asleep! This catches you up to date… so far!!

This morning I proceeded copy the entire CentOS 8.4 file using the dd command to the 2 TB drive. Having done that the next thing I proceeded to do was to boot CentOS 8.4, not once but several times. Each time it popped right up. This way I confirmed I did not have a flaky OS. Having proved that point I next proceed to update – and in effect rollover CentOS 8.4 to 8.5 – over some 2150 files. Hours later the rollover / update was complete. I then proceeded to reboot the machine. IT NEVER HAPPENED!!! Instead what I DID get was the EXACT same thing that occurred with Rocky Linux 8.5. To further prove that I had replicated the problem with Rocky Linux 8.5

Next I followed up by editing the BLUL (Big Long Ugly Line) and logged in as Root, then I issued “startx” which – as with RL 8.5 – dropped me into GNOME. I was able to go wondering around in GNOME all I wished but could not get to KDE. I did get the following message:

“Screen Lock Disabled”.
" Screen Locking requires GNOME Display Manager."

I then proceeded to run jlehtone’s history | grep nvidia. It gave me the exact same output I got on RL 8.5 and a few others including ‘install nvidia-detect’ and ‘nvidia-detect’ as wll as the ones for ‘kmod-nvidia’. I may – or may NOT – have found a clue in the following:

"Updating Subscription Management repositories."
"Unable to read Consumer Identity"

"This system is not registered with an entitlement server. You can use Subscription-Manager to register".

Now correct me if I am WRONG, but this thing about Subscription Management is a Red Hat thing that is limited to RHEL. I should not be forced to “register” my workstation with an “entitlement server”, especially since I am not running RHEL 8.5.

CONCLUSIONS: (So Far)

  1. This is NOT a faulty or corrupt file which occurred when I updated Rocky Linux 8.4 => 8.5, as I ended up with the EXACT same problems when I rolled over CentOS 8.4 => CentOS 8.5 which was first copied to a CLEAN 2 TB HDD. CentOS 8.4 worked just fine over and over. CentOS (presumably) 8.5 FAILED just as soon as I updated 8.4

  2. The ERRORS that did pop up were EXACTLY the same between Rocky Linux 8.5 and CentOS 8.5

  3. I do not have a clue what this jazz is about needing to “registered with an entitlement server. You can use Subscription-Manager to register”. This sounds a lot like Red Hat is trying to FORCE users who use FREE UNSUPPORTED OS’s such as CentOS, Rocky Linux, Alma Linux to use – and pay for – RHEL to avoid the nightmares we are encountering. This may be a drawback to using a bug-for-bug version of RHEL – whatever Red Hat puts into their OS will, by virtue of the beast, get copied into the clones. In short this may be a way for Red Hat to sabotage the clones. …Unless you decide to be a guinea pig for them by signing up to use Fedora or CentOS Streams.

  4. I have no idea what is going on with KDE and why I can’t bring up a login page when I use to be able to.

  5. Finally I am not sure what is going on with Nvidia. Nvidia – like KDE – BEFORE the UPDATE, now NEITHER work.

  6. My hopes of “Back Dooring” into Rocky Linux 8.5 are now pretty much DEAD. Now I go in search of obscure work-arounds.

Future EXPERIMENTS are planned to try and answer these questions.

For those who may wish to dive into the Dark Arts here are the specs for ocelot:

Motherboard: ASUS PRIME X570-Pro

CPU: AMD Ryzen 9 5900X 3700 MHz 12 core

Memory: 64 GB (Expandable to 128 GB) of GSkill DDR4 2x32 GB Trident Z Neo 3600

DRIVES:
NVMe – Corsair Force MP600 PCIx 4.0 Gen: 1 TB
SSD – Crucial MX500: 1 TB
HHD – Seagate Barracuda 2 TB

Graphics Card: ASUS Nvidia GeGforce GTX 1050 Ti

PSU: Corsair TX750M

OS’s:
Rocky Linux 8.5 (on the 1 TB NVMe Gen 4.0)
openSUSE 15.3 LEAP (on the 1 TB SSD – /dev/sda)
CentOS 8.5 (so far) (on the 2 TB HDD --/dev/sdb)

GUI: KDE Plasma – 5.18

Parts that may be a contributing factor/s: For the most part this is more-or-less a vanilla rig however the CPU (AMD Ryzen 9 5900X); Amount of RAM (64 GB Trident Z Neo 3600; the NVMe Gen 4.0 drive; and while rather vanilla, the Nvidia GeFore GTX 1050Ti graphics card rather than a Radeon graphics Card; and finally I’m running KDE not GNOME. Any and/or of these things may be skewing things up, though I tend to doubt it as CentOS and Rocky Linux 8.4 both worked, and nothing other than the updates has changed.

It would be nice to run this experiment again only with a Radeon Graphics Card to see if that is the problem. I have other vacuous experiments planned. If you have run into any problems that occurred after you updated your system, please post. Please state if you are using KDE /GNOME /Other and if your Graphics card is a Radeon or Nvidia.

D’Cat

What do you get with

dnf list installed \*nvidia\*

dnf list installed *nvidia* : CentOS 8.5

Output for dnf list installed /nvidia/ : CentOS 8.5

[Rocky Linux 8.5 coming up Next]

This system is not registered with an entitlement server. You can use subscription-manager to register.

Installed Packages
akmod-nvidia-390xx.x86_64 3:390.144-3.el8 @rpmfusion-nonfree-updates
kmod-nvidia-390xx.x86_64 3:390.143-1.el8 @rpmfusion-nonfree-updates
kmod-nvidia-390xx.x86_64 3:390.144-3.el8 @rpmfusion-nonfree-updates
kmod-nvidia-390xx-4.18.0-305.10.2.el8_4.x86_64.x86_64 3:390.144-3.el8 @@commandline
kmod-nvidia-390xx-4.18.0-305.12.1.el8_4.x86_64.x86_64 3:390.144-3.el8 @@commandline
kmod-nvidia-390xx-4.18.0-305.el8.x86_64.x86_64 3:390.143-1.el8 @rpmfusion-nonfree-updates
nvidia-detect.x86_64 460.84-1.el8.elrepo @elrepo
nvidia-settings-390xx.x86_64 390.143-1.el8 @rpmfusion-nonfree-updates
nvidia-settings-390xx.x86_64 390.144-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx.x86_64 3:390.143-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx.x86_64 3:390.144-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx-kmodsrc.x86_64 3:390.144-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx-libs.x86_64 3:390.143-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx-libs.x86_64 3:390.144-2.el8 @rpmfusion-nonfree-updates
[root@ocelot:~]/>

Here is the Output for Rocky Linux 8.5

dnf list installed /nvidia/

[root@ocelot:~]/>dnf list installed /nvidia/
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

Installed Packages
kernel-devel.x86_64 4.18.0-305.10.2.el8_4 @baseos
kernel-devel.x86_64 4.18.0-305.12.1.el8_4 @baseos
kernel-devel.x86_64 4.18.0-348.2.1.el8_5 @baseos
linux-firmware.noarch
20201218-102.git05789708.el8
@baseos
linux-firmware.noarch
20210702-103.gitd79c2677.el8
@baseos
xorg-x11-drv-nvidia-390xx.x86_64
3:390.143-2.el8 @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-390xx.x86_64
3:390.144-2.el8 @rpmfusion-nonfree-updates
[root@ocelot:~]/>

First of all, you have NVidia packages from RPMFusion. Being familiar with only ELRepo packages, I don’t know how RPMFusion behaves and should be used.

Second, your Rocky appears to have two versions of xorg-x11-drv-nvidia-390xx (3:390.143-2.el8 and 3:390.144-2.el8), which makes no sense. Surely these are not “installonly” packages?

The xorg-x11-drv-nvidia-390xx is most likely X11’s driver. There has to be also kernel module. Again, one driver without the other is odd, but could be the way RPMFusion allows them to be. It does explain the error message that you have mentioned: X11 attempts to load NVidia’s driver, but has to fall back to Nouveau, because kernel does not have NVidia’s module.

I would shift to ELRepo:

dnf remove xorg-x11-drv-nvidia-390xx
dnf install elrepo-release
dnf --disablerepo=rpmfusion* install nvidia-detect
dnf --disablerepo=rpmfusion* install $(nvidia-detect)
systemctl reboot

jlehtone

Thanks!! Will try your suggestion. Can’t hurt as this has become more-or-less a giant test bed, given that I have both CentOS 8.5 and Rocky Linux 8.5 and they both suffer from the exact same problems that occurred post update. Yeah I noticed the rpmfusion-nonfree-updates too, and wondered what happened to el-repo, but thought this was simply something that was pecular to Rocky Linux and/or Red Hat. I’ll let you know what I find.

D’Cat

jlehtone

RESULTS: All results were carried out on the CentOS 8.5 drive

The " dnf remove xorg-x11-drv-nvidia-390xx" worked perfectly.

The “dnf install elrepo-release” may or may not have worked, as the results that came back said it was already installed"

The "dnf --disablerepo=rpmfusion install nvidia-detect"* and "dnf --disablerepo=rpmfusion install $(nvidia-detect)"* both FAILED. The correct form is “disablerepo=rpmfusion” followed by 1) dnf install nvidia-detect and 2) dnf install $(nvidia-detect).

Doing it this way answers the question, “Where does the ‘rpmfusion-nonfree-updates’ come from”. Even though el-repo is installed, EVERYTHING from akmod-nvidia, kmod-nvidia, nvidia-settings, DEFAULTS to @rpmfusion-nonfree-updates. The odd-ball is nvidia-detect.x86_64 which DEFAULTS to @elrepo

I’ve got a few other experiments I am going to try. Tonight or tomorrow I’m going to nuke CentOS 8.5 and re-install CentOS 8.4 and try to work it from the other side of the equation and make GNOME the DEFAULT DE BEFORE I rollover (ie update) the machine and see if I get I get a login page when I reboot. If NOT then the chances are very good the problem is with the Nvidia drivers or card, or… The question then becomes WHY can still log into GNOME after I edit the BLUL then run startx, but still not access KDE.

'Tis a puzzle. I’ll keep you posted on my findings.

D’Cat

Why?
The RL had only xorg-x11-drv-nvidia-390xx. The CL had additional packages from RPMFusion.
I did assume that you want to fix the RL. On CL I would have cleaned the additional content too.

The objective of install elrepo-release is to ensure that that package is installed. If it is already, then doing nothing is correct action. If it is not, then installation is correct action. If the package is not installed after the command, then the command has failed (and did show error messages).

How did they fail?

What does that do?

What if you just do fresh install of Rocky Linux 8.5?