Kernel*-4.18.0-553.el8_10.x86_64 - unable to launch

chlowden · June 5, 2024, 10:45am

Hello
I was wondering if anyone else is having startup issue with kernel*-4.18.0-553.el8_10.x86_64 which has just become available?
The upgrade installed with out issue but when I pick this kernel, the screen goes immediately black and there is no command lines (as I have with the older kernels) so I presume that the kernel does not launch. When I choose an older kernel, the boot up is fine.
This is my second attempt to install this kernel so I figure it is not a download corruption. I was wondering if anyone else had a similar issue?
Thanks

jlehtone · June 5, 2024, 10:51am

You could test whether you can connect with ssh from another machine. If you can, then issue is “only” in graphics.

chlowden · June 5, 2024, 1:25pm

The machine is does not show up on my LAN. I don’t any of the initial “code” as the OS is deployed. there is no keyboard or mouse. It looks like the BIOS does not let the OS start or gets stuck immediately.
Thank for your idea

chlowden · June 7, 2024, 4:53am

I have upgraded my Dell 7530 to the latest BIOS but that has not resolved it. Is there anything very particular about this kernel?

toracat · June 7, 2024, 7:21pm

Are you using any kmod package from elrepo? If so, you’d need to update it when updating the kernel.

chlowden · June 8, 2024, 4:24am

When trying to remove a kmod I have installed, I get this ruby error and from what I understand, ruby is quite important.

sudo dnf update --nobest
Last metadata expiration check: 0:03:12 ago on sam. 08 juin 2024 06:18:35 CEST.
Dependencies resolved.

 Problem 1: package ruby-default-gems-3.3.1-2.module+el8.10.0+1827+16ecb9d2.noarch from appstream requires (rubygem(io-console) >= 0.5 with rubygem(io-console) < 1), but none of the providers can be installed
  - cannot install the best update candidate for package rubygem-did_you_mean-1.2.0-111.module+el8.9.0+1536+5f79634e.noarch
  - package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.x86_64 from appstream is filtered out by modular filtering
 Problem 2: package ruby-default-gems-3.3.1-2.module+el8.10.0+1827+16ecb9d2.noarch from appstream requires (rubygem(io-console) >= 0.5 with rubygem(io-console) < 1), but none of the providers can be installed
  - cannot install the best update candidate for package rubygem-openssl-2.1.2-111.module+el8.9.0+1536+5f79634e.x86_64
  - package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.x86_64 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.i686 from appstream is filtered out by modular filtering
  - package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.x86_64 from appstream is filtered out by modular filtering
==========================================================================================================================================================================================
 Package                                       Architecture                   Version                                                        Repository                              Size
==========================================================================================================================================================================================
Installing:
 kernel                                        x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  10 M
 kernel-core                                   x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  43 M
 kernel-devel                                  x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  24 M
 kernel-modules                                x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  36 M
Upgrading:
 bpftool                                       x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  11 M
 cockpit                                       x86_64                         310.4-1.el8_10                                                 baseos                                  89 k
 cockpit-bridge                                x86_64                         310.4-1.el8_10                                                 baseos                                 500 k
 cockpit-system                                noarch                         310.4-1.el8_10                                                 baseos                                 5.2 M
 cockpit-ws                                    x86_64                         310.4-1.el8_10                                                 baseos                                 982 k
 fuse                                          x86_64                         2.9.7-19.el8                                                   baseos                                  82 k
 fuse-common                                   x86_64                         3.3.0-19.el8                                                   baseos                                  21 k
 fuse-libs                                     x86_64                         2.9.7-19.el8                                                   baseos                                 102 k
 kernel-headers                                x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  12 M
 kernel-tools                                  x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  11 M
 kernel-tools-libs                             x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  10 M
 libxml2                                       x86_64                         2.9.7-18.el8_10.1                                              baseos                                 696 k
 microsoft-edge-stable                         x86_64                         125.0.2535.92-1                                                microsoft-edge                         162 M
 python3-libxml2                               x86_64                         2.9.7-18.el8_10.1                                              baseos                                 237 k
 python3-perf                                  x86_64                         4.18.0-553.5.1.el8_10                                          baseos                                  10 M
Removing:
 kernel                                        x86_64                         4.18.0-513.18.1.el8_9                                          @baseos                                  0  
 kernel-core                                   x86_64                         4.18.0-513.18.1.el8_9                                          @baseos                                 71 M
 kernel-devel                                  x86_64                         4.18.0-513.18.1.el8_9                                          @baseos                                 53 M
 kernel-modules                                x86_64                         4.18.0-513.18.1.el8_9                                          @baseos                                 25 M
Skipping packages with broken dependencies:
 ruby-default-gems                             noarch                         3.3.1-2.module+el8.10.0+1827+16ecb9d2                          appstream                               84 k

Transaction Summary
==========================================================================================================================================================================================
Install   4 Packages
Upgrade  15 Packages
Remove    4 Packages
Skip      1 Package

Total download size: 338 M
Is this ok [y/N]:

But sadly this reboot has not resolved the issue. Is this ruby error of any consquence?

chlowden · June 8, 2024, 10:18am

Maybe more pertinent … When I startup in 4.18.0-513.24.1 I get a long list of errors before the OS loads. I have had these errors for a longtime, even after having reinstalled the OS completely and it has not affected my usage of the desktop.

After this list, a DELL logo appears all is well.

But maybe these errors are stopping 533 from loading?

chlowden · June 8, 2024, 10:31am

Concerning kmod, I have reinstalled kmod-stex x86_64 6.02.0000.01-3.el8_10.elrepo but I am now getting nvidia errors that I don’t remember seeing before. Should I request to elrepo a recompile of stex for 553?
Many thanks

sudo dnf install kmod-stex
RPM Fusion for EL 8 - Free tainted              119 kB/s | 7.8 kB     00:00    
Dependencies resolved.
================================================================================
 Package        Arch        Version                           Repository   Size
================================================================================
Installing:
 kmod-stex      x86_64      6.02.0000.01-3.el8_10.elrepo      elrepo       32 k

Transaction Summary
================================================================================
Install  1 Package

Total download size: 32 k
Installed size: 67 k
Is this ok [y/N]: y
Downloading Packages:
kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64.r 147 kB/s |  32 kB     00:00    
--------------------------------------------------------------------------------
Total                                            51 kB/s |  32 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Installing       : kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64          1/1 
  Running scriptlet: kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64          1/1 
depmod: ERROR: fstatat(4, nvidia-drm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-modeset.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-peermem.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-uvm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia.ko.xz): No such file or directory
depmod: WARNING: /lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/uapfs.ko.xz needs unknown symbol ufsd_readahead_gfp_mask
depmod: WARNING: /lib/modules/4.18.0-553.el8_10.x86_64/kernel/external/uapfs/uapfs.ko needs unknown symbol ufsd_readahead_gfp_mask

  Verifying        : kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64          1/1 

Installed:
  kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64

toracat · June 8, 2024, 6:27pm

kmod-stex-6.02.0000.01-3.el8_10.elrepo was indeed built against the 8.10 GA kernel-4.18.0-553.el8_10. So that’s the one to be used with EL8.10.

chlowden · June 9, 2024, 3:15pm

Concerning the Nvidia errors, I found the page below that seems to be a very similar issue to mine …
https://www.reddit.com/r/redhat/comments/xuvxb8/nvidia_driver_issue/

It suggests reinstalling NVIDIA drivers and the dev & header kernels

    sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)

    sudo dnf module install -y nvidia-driver:latest-dkms

    reboot

but when I do it, I get this error …

sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Last metadata expiration check: 1:17:02 ago on dim. 09 juin 2024 07:01:15 CEST.
Package kernel-devel-4.18.0-513.24.1.el8_9.x86_64 is already installed.
No match for argument: kernel-headers-4.18.0-513.24.1.el8_9.x86_64
Error: Unable to find a match: kernel-headers-4.18.0-513.24.1.el8_9.x86_64

I have forced a 533 kernel headers install … But that did not repair the issue. I also uninstalled the stex kmod … But has not solved it either.
Is there a way to launch the kernel with all the kmods installed to deduce where the issue is coming from? Thank you

rfs · June 11, 2024, 4:05pm

Some thoughts from a non-expert who has had a similar but not necessarily same cause problem:

Your attempt to reinstall the kernel-devel and kernel-headers packages appears to have happened on the old system while you were running the old kernel, and uname -r was returning 4.18.0-513.24.1.el8_9.x86_64 when what you needed was to install the versions for the new kernel 4.18.0-553.5.1.el8_10.x86_64 (which may of course already have been installed (?)). You can pick those explicitly rather than using uname -r to return the details of the old (running) kernel, and perhaps your comments indicate that you have already done exactly that (?).
What do you get returned from lspci -nn ? In particular, can you find your video hardware there ? If so, what are the details ?
When you say that you get no “command line” after picking the new kernel, can you tell me exactly what you are expecting and not getting ? (For me, after picking the new kernel from the grub menu, the next things I get are either a pictorial please-wait screen (if quiet rhgb in the grub command line) or a list of progress reports (otherwise).)
What is kmod-stex used for ?
(My hunch is that the ruby problem is not the issue, but I don’t really know.)
The screenshot of CPU package temperatures being too high might be a hardware problem but I’m pretty sure it’s not the issue currently.
I’d put my money on the missing nvidia-drm.ko.xz files. Have you checked whether they are present in /lib/modules/4.18.0-553.5.1.el8_10/… ? (Use find to search all the directories below this, and check that your use of find is working by using to find something you definitely can see you have got.)
Alternatively, if the stex module is responsible for reading your disk, it could be something to do with that. When you get to the grub screen at which you choose your kernel, press Escape to give yourself more time, then pick the option for editing the command line, and insert a new line reading

echo ‘About to load linux 553’

just before the line starting linux, then another similar just after that command, then another after the initrd command, and then execute it and see whether these get printed out - and then whether they disappear again or just get stuck with nothing else happening.

I think that’s enough guessing for now - do let me know what you find from the above.
Roger.

rfs · June 11, 2024, 4:27pm

Here’s another thought. Did you go to Red Hat 8.6 NVIDIA Optimus and 12th Gen Intel Alder Lake Driver Installation — SPX Labs as suggested by your reference ? I.e. did you add something like GRUB_CMDLINE_LINUX=‘“i915.force_probe=46a8”’ (being very careful with the nested single and double quotes, and where 46a8 is to be replaced by the number after the word Device in the output of
lspci |grep VGA
) to the end of /etc/default/grub ? And then run grub2-mkconfig with the -o option pointing to the correct location of your grub.cfg file in /boot/grub2 ? (which may or may not be the location given in that article - mine is /boot/grub2/grub.cfg ) And then run
dnf install -y module nvidia-driver:latest-dkms
?

Roger.

chlowden · June 12, 2024, 7:44am

Hello RFS
Thank you for your interest in my problem.

I have installed and uninstalled 553 using dnf update without issue, so I figure that all is well.

lspci -nn
 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GLM [Quadro P1000 Mobile] [10de:1cbb] (rev a1)

Following the grub menu, the screen goes black and stays that way until I crash the machine. The keyboard is lit up but nothing else.
kmod-stex is a driver for mounting Promise Pegasus external hard drives. It is not at all essential for the machine.

cd /lib/modules/
(base) [admin@LOWROCKY modules]$ ls
4.18.0-348.el8.0.2.x86_64     4.18.0-477.21.1.el8_8.x86_64  4.18.0-513.11.1.el8_9.0.1.x86_64  4.18.0-513.18.1.el8_9.x86_64  4.18.0-513.5.1.el8_9.x86_64  4.18.0-553.5.1.el8_10.x86_64
4.18.0-477.10.1.el8_8.x86_64  4.18.0-477.27.1.el8_8.x86_64  4.18.0-513.11.1.el8_9.x86_64      4.18.0-513.24.1.el8_9.x86_64  4.18.0-513.9.1.el8_9.x86_64  4.18.0-553.el8_10.x86_64

All the folders above have files in them.

and the result

and it stops there and I eventually crash the machine.
But it seems to go through the grub sequence.
Maybe the issue is with

these BIOS errors I get when I launch 513. I have installed the latest Dell BIOS . From what I have read on the web, other linux OSs have a similar errors but they seem to be unimportant.
Then there is a FAILED to start Load Kernel Modules on 513 that maybe blocking 553

rfs · June 12, 2024, 8:47am

Let me give you my email address which is myfirstname dot sewell at cantab dot net,
Might you be able to send me (to my email rather than here) the output of
find /lib/modules -name *nvidia* -print
which might possibly be lengthy ?
Can I check whether you have tried the solution given in my second post to you (starting “Here’s another thought”) ?
Please could you try point 8 in my first post to you; I understand that stex is not relevant, but another possibility is that there’s also an nvidia driver issue affecting grub somehow, which that might cast light on.
My similar problem turned out to be a change in the display driver module, which in my case was not anything to do with nvidia but rather the mgag200 module (which I’d guess you won’t have in your system). However, it’s worth mentioning because it turned out that huge changes had been made in this module between versions (there presumably was a good reason, but it does make one think “if it ain’t broke don’t fix it”). I eventually found the bug and got my fix accepted by RedHat. It’s entirely possible that a pile of similar changes were made in the nvidia module for the same reason (whatever that was), and/or that a similar bug has been introduced. You can almost certainly easily download the source of the relevant modules - I can explain how if that’s of interest.

Roger.

iwalker · June 12, 2024, 10:01am

@rfs The forum exists for people to post and resolve problems. We recommend that all communication takes place here, and not via email purely for two reasons. First, more people can see the problem and solution. Two, for security reasons, people may or may not wish to give away their email address for it later to potentially be abused.

Even if the output from the modules is lengthy, the forum formatting tools, allow such info to appear in a formatted way that it is easy enough to read. I would say far easier than by email as well.

chlowden · June 12, 2024, 10:15am

Thank you so much for your offer.

find /lib/modules -name *nvidia* -print
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-modeset.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-drm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-uvm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-peermem.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-drm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-modeset.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-peermem.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-uvm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-modeset.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-drm.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-uvm.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-peermem.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/weak-updates/nvidia-peermem.ko.xz

I have tried it with out much success

GNU nano 2.9.8                  /etc/default/grub                             

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rl_lowrocky00-swap rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_CMDLINE_LINUX="i915.force_probe=10de:1cbb"

Also, I followed the page and got more errors

sudo dnf install -y module nvidia-driver:latest-dkms
[sudo] password for admin: 
Last metadata expiration check: 3:41:40 ago on mer. 12 juin 2024 08:02:42 CEST.
No match for argument: module
No match for argument: nvidia-driver:latest-dkms
Error: Unable to find a match: module nvidia-driver:latest-dkms

Your point 8 is my point 7 with the pictures (sorry for the confusion).
I’ve tried following the FAILED kernel error in 513 boot and an error for a paragon NTFS service that I had forgotten about.

systemctl status systemd-modules-load.service
systemd-modules-load.service - Load Kernel Modules
   Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2024-06-12 11:42:03 CEST; 16min ago
     Docs: man:systemd-modules-load.service(8)
           man:modules-load.d(5)
 Main PID: 1053 (code=exited, status=1/FAILURE)

juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: Starting Load Kernel Modules...
juin 12 11:42:03 LOWROCKY.localdomain systemd-modules-load[1053]: Module 'msr' is builtin
juin 12 11:42:03 LOWROCKY.localdomain systemd-modules-load[1053]: Failed to insert 'uapfs': No such file or directory
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: Failed to start Load Kernel Modules.

I have uninstalled the paragon service and yet again I get NVIDIA errors.

(base) [admin@LOWROCKY ufsd_driver_Retail_Professional_lke_10.4.0_b1913]$ sudo ./uninstall.sh 
[sudo] password for admin: 
Uninstalling driver
depmod: ERROR: fstatat(4, nvidia-drm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-modeset.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-peermem.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-uvm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia.ko.xz): No such file or directory
Driver uninstalled!
Removing automount scripts
Would you like to uninstall HFS/NTFS utilities? [yes/no]
yes
 REMOVE   /usr/local/bin /usr/bin
HFS/NTFS utilities uninstalled!

But after a reboot, I still get a black screen.

Could there be something wrong with this GRUB command?

GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rl_lowrocky00-swap rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau"

rfs · June 12, 2024, 10:50am

@iwalker: Thank you for your policy statement. For me, I cannot access the forums on my main computer at all, I think because of local network security issues. I entirely agree that solutions should be posted, but am not convinced that the process of getting there via numerous misguesses is that helpful to others. While I’m happy to try to help, being only able to do so via forum messages is for me sufficiently inconvenient that there’s a much lower limit on how much effort I’m prepared to put in via that route. Hence my suggestion - which people are of course free to decline.

rfs · June 12, 2024, 11:04am

Having looked at what you’ve just posted, I have the following observations:

In your old setup you have (I think) the nvidia drivers overridden by ones that have been loaded from elsewhere, probably from kmod-nvidia… from elrepo (?). You could check that by running
yum list installed kmod-nvidia\*
under your old system. In the new system they are maybe not so overridden, and my guess is that you will not have the latest kmod-nvidia package installed for the new system (which is available from elrepo). You can check what is or isn’t overridden by looking in /etc/depmod.d . A possible reason for the complaints about missing nvidia modules is that depmod.d/nvidia…conf contains a command to override from weak-updates instead of from extra.
There may indeed be something missing from your grub kernel command line, I’m not sure. My current grub kernel command line also contains the phrases rd.lvm.lv=vg_revelationr9/root rd.lvm.lv=vg_revelationr9/swap (where vg_revelationr9 is the name of the LVM2 volume group my root partition sits in). Have you compared your new command line with the old one from which your working system runs ?

Roger.

chlowden · June 12, 2024, 4:55pm

Thank you Roger for your thoughts

sudo yum list installed kmod-nvidia\*
[sudo] password for admin: 
Installed Packages
kmod-nvidia-latest-dkms.x86_64       3:555.42.02-1.el8        @cuda-rhel8-x86_64

I don’t have a nvidia file in the depmod.d folder

[root@LOWROCKY depmod.d]# ls
dist.conf  kmod-stex.conf

NVIDIA driver management is probably the only reason why I encourage my colleagues to move to linux. Having 2 code sources, it adds so much confusion. I still don’t understand why there is elrepo on top of the official NVIDIA driver. I think I have had had more success with the NVIDIA driver than elrepo, but I can no longer be sure. What I do know is that when I choose the wrong one, I waste a day making the machine work properly.
Maybe I should just uninstall all NVIDIA and reinstall from one source (why am I writing as I really don’t want to do it) ?

rfs · June 12, 2024, 5:52pm

I think that if I were in your position I would consider the following steps:

Check that the actual command line you are running has something on the lines of root=/dev/mapper/vg_revelationr9-root ro
in it (I think it probably has, as I’ve tried removing this from mine, and I get a different set of symptoms from what you have).
Try adding a file to /etc/depmod.d called nvidia.conf containing
override nvidia * weak-updates/nvidia
then run
dracut 4.18.0-553.el8_10
which should remake the relevant initramfs file in /boot
then reboot and see what happens. So long as you give the correct kernel version number that should leave your existing system untouched, but if worried then make a copy of the initramfs file for your working system first.
If that doesn’t work I’d repeat the above with weak-updates replaced by extra .
If that doesn’t work, I would install kmod-nvidia-550.90.07-1.el8_10.elrepo.x86_64.rpm from elrepo , then reboot and see what happens. Warning: it’s possible that doing that might make your new system work but stop your old system working - I don’t know.

But in any event, I’m not an expert here. It goes without saying that you need a copy of your whole system before trying such things !

Good luck,
Roger.

Topic		Replies	Views
Nvidia Drivers not Loading nor Installing Propperly on Rocky Linux 8 from RPM Fusion Rocky Linux Help & Support rocky-linux-8 , dell	1	544	September 26, 2024
Trouble in River City -- Rocky Linux 8.5 Rollover Blew Up Rocky Linux Help & Support	16	2236	November 28, 2021
Another noob another Nvidia topic Rocky Linux Help & Support	16	3261	December 3, 2022
Problem making nvidia driver work on 9.3 on kernel 5.14.0-362.18.1.el9_3.0.1.x86_64 Rocky Linux Help & Support	13	5817	March 18, 2024
The Further Experiments of a Vacuous Experimenter Rocky Linux Help & Support	8	709	December 1, 2021

Kernel*-4.18.0-553.el8_10.x86_64 - unable to launch

Related topics