Rocky 9.5 doesn't boot since I forced Nvidia updates

A series of steps here, and I don’t know which one broke it, but now when trying to boot I get a few error messages (but I was getting them before, too) too fast to note down, a new message (probably since the 9.5 upgrade) warning me my CPU is obsolete and will not be supported in some future release, then several minutes of a small arrow chasing its tail around and around then it shuts down with no other visible messages.

  1. prior to the 9.5 upgrade, I had a bunch of nvidia files that came up for updates when doing “sudo dnf update” but they wouldn’t install, for reasons like:
  2. file so-and-so from package such and such conflicts with file so-and-so from package such-and-such-later-version. I figured there were some other updates pending that weren’t ready yet.
  3. 9.5 became available so I did sudo dnf upgrade, which installed nearly 500 packages and rebooted fine, after.
  4. I then again did sudo dnf update and got the same batch of updates for nvidia (and a few other things) but could install them only if I excluded *nvidia*.
  5. tried several things in an attempt to get the nvidia updates installed, finally boiling down to “sudo dnf --allowerasing”, which seemed like all it should do is erase an existing package before installing an update for it. It appeared to do the trick, as dnf claimed to have installed everything.
  6. So, reboot…
  7. Nothing (useful) happens.
  8. it stops and hangs before getting to the grub boot menu, but as described above, eventually powers off without further visible output or further input from me.

I’ve got it booted, now, from a 9.5 live CD, and I can use the disks utility to examine the filesystems/partitions on the boot drives (a pair of SSDs in RAID1). some of them were reported as needing repair, so I ran the appropriate tool from a text console window, and after that disks said they were now good.

try reboot:
Nope. No such luck.

so, now I’m kinda trying to figure out where to go next. I know I COULD reinstall, but I’d really prefer not to.

Much of the non-system content that I put on it is actually on a NAS, but I’ve spent a year and a half tweaking things on the system to work the way I like, (and of course didn’t keep records) and don’t want to have to start all over.

I do have nightly/weekly/monthly backups, but apparently don’t have backup of /boot or /boot/efi.

Am thinking about reinstalling grub but have never done that before on a (previously) working system, so I am asking for pointers to details on EXACTLY what one needs to do to make that happen.

Also thinking that perhaps the initial ram disk may have gotten hosed. I have fixed those before, but it’s been some years, so any advice on that would also be helpful.

Hoping for a little guidance here, Thanks in advance!

Fred

Hi,

How did you install nvidia drivers?

When you say it doesn’t boot are you getting an error message? If you press ctrl-alt-f3 do you get a console?

I ask has I had a similar issue with the nvida modules from elrepo, I suspect a reinstall would have fixed. But, instead I installed akmod-nvidia as this should rebuild the modules when the kernel is updated.

Regards Tom

nvidia drivers from rpmfusion.
updated as described earlier.

there are a couple of error msgs, but they have been there for a long time.

I do need to go look at dmesg and /var/log/messages, etc., though.
But since I never see the grub menu there may not be anything useful in those logs.

Fred

here’s a bunch of lines from the end of /var/log/messages. Perhaps someone can suggest how to fix these errors?

also, I would think it SHOULD have gone past the grub/boot menu long before getting to this point, but I saw no such screen despite trying the boot a number of times.

Blockquote
Nov 21 11:54:48 rockybox akmods[1300]: Building and installing nvidia-kmod[FAILED]
Nov 21 11:54:48 rockybox akmods[1300]: Could not install newly built RPMs. You can find them and the logfile in:
Nov 21 11:54:48 rockybox akmods[1300]: /var/cache/akmods/nvidia/550.127.05-1-for-5.14.0-503.14.1.el9_5.x86_64.failed.log
Nov 21 11:54:48 rockybox akmods[1300]: [FAILED]
Nov 21 11:54:48 rockybox systemd[1]: Finished Builds and install new kmods from akmod packages.
Nov 21 11:54:48 rockybox systemd[1]: Fallback to nouveau as nvidia did not load was skipped because of an unmet condition check (ConditionPathExists=!/sys/module/nvidia).
Nov 21 11:54:48 rockybox systemd[1]: Starting Light Display Manager…
Nov 21 11:54:48 rockybox systemd[1]: Received SIGRTMIN+21 from PID 505 (plymouthd).
Nov 21 11:54:48 rockybox systemd[1]: Started Light Display Manager.
Nov 21 11:54:48 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
module and all NVIDIA driver#012NVRM: components have the same version.
Nov 21 11:54:48 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
module and all NVIDIA driver#012NVRM: components have the same version.
Nov 21 11:54:48 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
module and all NVIDIA driver#012NVRM: components have the same version.
Nov 21 11:54:48 rockybox systemd[1]: Created slice Slice /system/systemd-coredump.
Nov 21 11:54:48 rockybox systemd[1]: Started Process Core Dump (PID 16088/UID 0).
Nov 21 11:54:48 rockybox systemd-coredump[16089]: Process 16087 (Xorg) of user 0 dumped core.#012#012Stack trace of thread 16087:#012#0 0x00007fc25d08ba6c __pthread_kill_implementation (libc.so.6

  • 0x8ba6c)#012#1 0x00007fc25d03e686 raise (libc.so.6 + 0x3e686)#012#2 0x00007fc25d028833 abort (libc.so.6 + 0x28833)#012#3 0x000055e9d12fa970 OsAbort (Xorg + 0x1db970)#012#4 0x000055e9d12fb5e
    4 FatalError (Xorg + 0x1dc5e4)#012#5 0x000055e9d116be6a main (Xorg + 0x4ce6a)#012#6 0x00007fc25d0295d0 __libc_start_call_main (libc.so.6 + 0x295d0)#012#7 0x00007fc25d029680 __libc_start_main@@G
    LIBC_2.34 (libc.so.6 + 0x29680)#012#8 0x000055e9d116bea5 _start (Xorg + 0x4cea5)#012ELF object binary architecture: AMD x86-64
    Nov 21 11:54:48 rockybox systemd[1]: systemd-coredump@0-16088-0.service: Deactivated successfully.
    Nov 21 11:54:48 rockybox systemd[1]: Received SIGRTMIN+21 from PID 505 (plymouthd).
    Nov 21 11:54:49 rockybox systemd[1]: Finished Hold until boot process finishes up.
    Nov 21 11:54:49 rockybox systemd[1]: Reached target Multi-User System.
    Nov 21 11:54:49 rockybox systemd[1]: Reached target Graphical Interface.
    Nov 21 11:54:49 rockybox systemd[1]: Starting Record Runlevel Change in UTMP…
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:49 rockybox systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
    Nov 21 11:54:49 rockybox systemd[1]: Finished Record Runlevel Change in UTMP.
    Nov 21 11:54:49 rockybox systemd[1]: Startup finished in 3ms (firmware) + 55us (loader) + 1.556s (kernel) + 2.980s (initrd) + 2min 21.967s (userspace) = 2min 26.507s.
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 1.
    Nov 21 11:54:49 rockybox systemd[1]: Stopped Light Display Manager.
    Nov 21 11:54:49 rockybox systemd[1]: Starting Light Display Manager…
    Nov 21 11:54:49 rockybox systemd[1]: Started Light Display Manager.
    Nov 21 11:54:49 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:49 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:49 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:49 rockybox systemd[1]: Started Process Core Dump (PID 16103/UID 0).
    Nov 21 11:54:49 rockybox systemd-coredump[16104]: Process 16102 (Xorg) of user 0 dumped core.#012#012Stack trace of thread 16102:#012#0 0x00007faab628ba6c __pthread_kill_implementation (libc.so.6
  • 0x8ba6c)#012#1 0x00007faab623e686 raise (libc.so.6 + 0x3e686)#012#2 0x00007faab6228833 abort (libc.so.6 + 0x28833)#012#3 0x000055985bfdd970 OsAbort (Xorg + 0x1db970)#012#4 0x000055985bfde5e
    4 FatalError (Xorg + 0x1dc5e4)#012#5 0x000055985be4ee6a main (Xorg + 0x4ce6a)#012#6 0x00007faab62295d0 __libc_start_call_main (libc.so.6 + 0x295d0)#012#7 0x00007faab6229680 __libc_start_main@@G
    LIBC_2.34 (libc.so.6 + 0x29680)#012#8 0x000055985be4eea5 _start (Xorg + 0x4cea5)#012ELF object binary architecture: AMD x86-64
    Nov 21 11:54:49 rockybox systemd[1]: systemd-coredump@1-16103-0.service: Deactivated successfully.
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:49 rockybox systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 2.
    Nov 21 11:54:49 rockybox systemd[1]: Stopped Light Display Manager.
    Nov 21 11:54:49 rockybox systemd[1]: Starting Light Display Manager…
    Nov 21 11:54:49 rockybox systemd[1]: Started Light Display Manager.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel
    module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox systemd[1]: Started Process Core Dump (PID 16116/UID 0).
    Nov 21 11:54:50 rockybox systemd-coredump[16117]: Process 16115 (Xorg) of user 0 dumped core.#012#012Stack trace of thread 16115:#012#0 0x00007fc25528ba6c __pthread_kill_implementation (libc.so.6
  • 0x8ba6c)#012#1 0x00007fc25523e686 raise (libc.so.6 + 0x3e686)#012#2 0x00007fc255228833 abort (libc.so.6 + 0x28833)#012#3 0x0000562666fd0970 OsAbort (Xorg + 0x1db970)#012#4 0x0000562666fd15e
    4 FatalError (Xorg + 0x1dc5e4)#012#5 0x0000562666e41e6a main (Xorg + 0x4ce6a)#012#6 0x00007fc2552295d0 __libc_start_call_main (libc.so.6 + 0x295d0)#012#7 0x00007fc255229680 __libc_start_main@@G
    LIBC_2.34 (libc.so.6 + 0x29680)#012#8 0x0000562666e41ea5 _start (Xorg + 0x4cea5)#012ELF object binary architecture: AMD x86-64
    Nov 21 11:54:50 rockybox systemd[1]: systemd-coredump@2-16116-0.service: Deactivated successfully.
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 3.
    Nov 21 11:54:50 rockybox systemd[1]: Stopped Light Display Manager.
    Nov 21 11:54:50 rockybox systemd[1]: Starting Light Display Manager…
    Nov 21 11:54:50 rockybox systemd[1]: Started Light Display Manager.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:50 rockybox systemd[1]: Started Process Core Dump (PID 16129/UID 0).
    Nov 21 11:54:50 rockybox systemd-coredump[16130]: Process 16128 (Xorg) of user 0 dumped core.#012#012Stack trace of thread 16128:#012#0 0x00007f825028ba6c __pthread_kill_implementation (libc.so.6 + 0x8ba6c)#012#1 0x00007f825023e686 raise (libc.so.6 + 0x3e686)#012#2 0x00007f8250228833 abort (libc.so.6 + 0x28833)#012#3 0x0000560d9c65a970 OsAbort (Xorg + 0x1db970)#012#4 0x0000560d9c65b5e4 FatalError (Xorg + 0x1dc5e4)#012#5 0x0000560d9c4cbe6a main (Xorg + 0x4ce6a)#012#6 0x00007f82502295d0 __libc_start_call_main (libc.so.6 + 0x295d0)#012#7 0x00007f8250229680 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29680)#012#8 0x0000560d9c4cbea5 _start (Xorg + 0x4cea5)#012ELF object binary architecture: AMD x86-64
    Nov 21 11:54:50 rockybox systemd[1]: systemd-coredump@3-16129-0.service: Deactivated successfully.
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:50 rockybox systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 4.
    Nov 21 11:54:50 rockybox systemd[1]: Stopped Light Display Manager.
    Nov 21 11:54:50 rockybox systemd[1]: Starting Light Display Manager…
    Nov 21 11:54:50 rockybox systemd[1]: Started Light Display Manager.
    Nov 21 11:54:51 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:51 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:51 rockybox kernel: NVRM: API mismatch: the client has the version 550.127.05, but#012NVRM: this kernel module has the version 550.107.02. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.
    Nov 21 11:54:51 rockybox systemd[1]: Started Process Core Dump (PID 16142/UID 0).
    Nov 21 11:54:51 rockybox systemd-coredump[16143]: Process 16141 (Xorg) of user 0 dumped core.#012#012Stack trace of thread 16141:#012#0 0x00007efe9f68ba6c __pthread_kill_implementation (libc.so.6 + 0x8ba6c)#012#1 0x00007efe9f63e686 raise (libc.so.6 + 0x3e686)#012#2 0x00007efe9f628833 abort (libc.so.6 + 0x28833)#012#3 0x000055fae5a19970 OsAbort (Xorg + 0x1db970)#012#4 0x000055fae5a1a5e4 FatalError (Xorg + 0x1dc5e4)#012#5 0x000055fae588ae6a main (Xorg + 0x4ce6a)#012#6 0x00007efe9f6295d0 __libc_start_call_main (libc.so.6 + 0x295d0)#012#7 0x00007efe9f629680 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29680)#012#8 0x000055fae588aea5 _start (Xorg + 0x4cea5)#012ELF object binary architecture: AMD x86-64
    Nov 21 11:54:51 rockybox systemd[1]: systemd-coredump@4-16142-0.service: Deactivated successfully.
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 5.
    Nov 21 11:54:51 rockybox systemd[1]: Stopped Light Display Manager.
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Start request repeated too quickly.
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Failed with result ‘exit-code’.
    Nov 21 11:54:51 rockybox systemd[1]: Failed to start Light Display Manager.
    Nov 21 11:54:51 rockybox systemd[1]: lightdm.service: Triggering OnFailure= dependencies.
    Nov 21 11:54:51 rockybox systemd[1]: Starting Terminate Plymouth Boot Screen…
    Nov 21 11:54:51 rockybox systemd[1]: Finished Terminate Plymouth Boot Screen.
    Nov 21 12:07:34 rockybox systemd[1]: Starting Cleanup of Temporary Directories…
    Nov 21 12:07:34 rockybox systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
    Nov 21 12:07:34 rockybox systemd[1]: Finished Cleanup of Temporary Directories.
    Nov 21 12:07:34 rockybox systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.

can’t get to a status where I can open a terminal or a console window, so I don’t know how to do that.

Hi,

You could try a chroot from the livecd.

Remove the akmod driver. The OS should now boot with the nouveau driver, then reinstall the akmod driver.

Regards Tom.

1 Like

Yessss!!!

That did the trick!

And it makes sense that it would, now that my nose was rubbed in it.

Thanks a ton for the advice!

Fred

1 Like