Hello
I was wondering if anyone else is having startup issue with kernel*-4.18.0-553.el8_10.x86_64
which has just become available?
The upgrade installed with out issue but when I pick this kernel, the screen goes immediately black and there is no command lines (as I have with the older kernels) so I presume that the kernel does not launch. When I choose an older kernel, the boot up is fine.
This is my second attempt to install this kernel so I figure it is not a download corruption. I was wondering if anyone else had a similar issue?
Thanks
You could test whether you can connect with ssh from another machine. If you can, then issue is “only” in graphics.
The machine is does not show up on my LAN. I don’t any of the initial “code” as the OS is deployed. there is no keyboard or mouse. It looks like the BIOS does not let the OS start or gets stuck immediately.
Thank for your idea
I have upgraded my Dell 7530 to the latest BIOS but that has not resolved it. Is there anything very particular about this kernel?
Are you using any kmod package from elrepo? If so, you’d need to update it when updating the kernel.
When trying to remove a kmod I have installed, I get this ruby error and from what I understand, ruby is quite important.
sudo dnf update --nobest
Last metadata expiration check: 0:03:12 ago on sam. 08 juin 2024 06:18:35 CEST.
Dependencies resolved.
Problem 1: package ruby-default-gems-3.3.1-2.module+el8.10.0+1827+16ecb9d2.noarch from appstream requires (rubygem(io-console) >= 0.5 with rubygem(io-console) < 1), but none of the providers can be installed
- cannot install the best update candidate for package rubygem-did_you_mean-1.2.0-111.module+el8.9.0+1536+5f79634e.noarch
- package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.x86_64 from appstream is filtered out by modular filtering
Problem 2: package ruby-default-gems-3.3.1-2.module+el8.10.0+1827+16ecb9d2.noarch from appstream requires (rubygem(io-console) >= 0.5 with rubygem(io-console) < 1), but none of the providers can be installed
- cannot install the best update candidate for package rubygem-openssl-2.1.2-111.module+el8.9.0+1536+5f79634e.x86_64
- package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-142.module+el8.10.0+1741+bdb5b6ca.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.11-143.module+el8.10.0+1826+b62220b4.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.6-139.module+el8.9.0+1537+0b2034bd.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-141.module+el8.6.0+1002+a7dba0ac.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.5.7-143.module+el8.10.0+1820+f3fffb92.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-1.module+el8.10.0+1679+61871737.x86_64 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.i686 from appstream is filtered out by modular filtering
- package rubygem-io-console-0.7.1-2.module+el8.10.0+1827+16ecb9d2.x86_64 from appstream is filtered out by modular filtering
==========================================================================================================================================================================================
Package Architecture Version Repository Size
==========================================================================================================================================================================================
Installing:
kernel x86_64 4.18.0-553.5.1.el8_10 baseos 10 M
kernel-core x86_64 4.18.0-553.5.1.el8_10 baseos 43 M
kernel-devel x86_64 4.18.0-553.5.1.el8_10 baseos 24 M
kernel-modules x86_64 4.18.0-553.5.1.el8_10 baseos 36 M
Upgrading:
bpftool x86_64 4.18.0-553.5.1.el8_10 baseos 11 M
cockpit x86_64 310.4-1.el8_10 baseos 89 k
cockpit-bridge x86_64 310.4-1.el8_10 baseos 500 k
cockpit-system noarch 310.4-1.el8_10 baseos 5.2 M
cockpit-ws x86_64 310.4-1.el8_10 baseos 982 k
fuse x86_64 2.9.7-19.el8 baseos 82 k
fuse-common x86_64 3.3.0-19.el8 baseos 21 k
fuse-libs x86_64 2.9.7-19.el8 baseos 102 k
kernel-headers x86_64 4.18.0-553.5.1.el8_10 baseos 12 M
kernel-tools x86_64 4.18.0-553.5.1.el8_10 baseos 11 M
kernel-tools-libs x86_64 4.18.0-553.5.1.el8_10 baseos 10 M
libxml2 x86_64 2.9.7-18.el8_10.1 baseos 696 k
microsoft-edge-stable x86_64 125.0.2535.92-1 microsoft-edge 162 M
python3-libxml2 x86_64 2.9.7-18.el8_10.1 baseos 237 k
python3-perf x86_64 4.18.0-553.5.1.el8_10 baseos 10 M
Removing:
kernel x86_64 4.18.0-513.18.1.el8_9 @baseos 0
kernel-core x86_64 4.18.0-513.18.1.el8_9 @baseos 71 M
kernel-devel x86_64 4.18.0-513.18.1.el8_9 @baseos 53 M
kernel-modules x86_64 4.18.0-513.18.1.el8_9 @baseos 25 M
Skipping packages with broken dependencies:
ruby-default-gems noarch 3.3.1-2.module+el8.10.0+1827+16ecb9d2 appstream 84 k
Transaction Summary
==========================================================================================================================================================================================
Install 4 Packages
Upgrade 15 Packages
Remove 4 Packages
Skip 1 Package
Total download size: 338 M
Is this ok [y/N]:
But sadly this reboot has not resolved the issue. Is this ruby error of any consquence?
Maybe more pertinent … When I startup in 4.18.0-513.24.1 I get a long list of errors before the OS loads. I have had these errors for a longtime, even after having reinstalled the OS completely and it has not affected my usage of the desktop.
After this list, a DELL logo appears all is well.
But maybe these errors are stopping 533 from loading?
Concerning kmod, I have reinstalled kmod-stex x86_64 6.02.0000.01-3.el8_10.elrepo but I am now getting nvidia errors that I don’t remember seeing before. Should I request to elrepo a recompile of stex for 553?
Many thanks
sudo dnf install kmod-stex
RPM Fusion for EL 8 - Free tainted 119 kB/s | 7.8 kB 00:00
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
kmod-stex x86_64 6.02.0000.01-3.el8_10.elrepo elrepo 32 k
Transaction Summary
================================================================================
Install 1 Package
Total download size: 32 k
Installed size: 67 k
Is this ok [y/N]: y
Downloading Packages:
kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64.r 147 kB/s | 32 kB 00:00
--------------------------------------------------------------------------------
Total 51 kB/s | 32 kB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64 1/1
Running scriptlet: kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64 1/1
depmod: ERROR: fstatat(4, nvidia-drm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-modeset.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-peermem.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-uvm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia.ko.xz): No such file or directory
depmod: WARNING: /lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/uapfs.ko.xz needs unknown symbol ufsd_readahead_gfp_mask
depmod: WARNING: /lib/modules/4.18.0-553.el8_10.x86_64/kernel/external/uapfs/uapfs.ko needs unknown symbol ufsd_readahead_gfp_mask
Verifying : kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64 1/1
Installed:
kmod-stex-6.02.0000.01-3.el8_10.elrepo.x86_64
kmod-stex-6.02.0000.01-3.el8_10.elrepo was indeed built against the 8.10 GA kernel-4.18.0-553.el8_10. So that’s the one to be used with EL8.10.
Concerning the Nvidia errors, I found the page below that seems to be a very similar issue to mine …
https://www.reddit.com/r/redhat/comments/xuvxb8/nvidia_driver_issue/
It suggests reinstalling NVIDIA drivers and the dev & header kernels
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
sudo dnf module install -y nvidia-driver:latest-dkms
reboot
but when I do it, I get this error …
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
Last metadata expiration check: 1:17:02 ago on dim. 09 juin 2024 07:01:15 CEST.
Package kernel-devel-4.18.0-513.24.1.el8_9.x86_64 is already installed.
No match for argument: kernel-headers-4.18.0-513.24.1.el8_9.x86_64
Error: Unable to find a match: kernel-headers-4.18.0-513.24.1.el8_9.x86_64
I have forced a 533 kernel headers install … But that did not repair the issue. I also uninstalled the stex kmod … But has not solved it either.
Is there a way to launch the kernel with all the kmods installed to deduce where the issue is coming from? Thank you
Some thoughts from a non-expert who has had a similar but not necessarily same cause problem:
-
Your attempt to reinstall the kernel-devel and kernel-headers packages appears to have happened on the old system while you were running the old kernel, and uname -r was returning 4.18.0-513.24.1.el8_9.x86_64 when what you needed was to install the versions for the new kernel 4.18.0-553.5.1.el8_10.x86_64 (which may of course already have been installed (?)). You can pick those explicitly rather than using
uname -r
to return the details of the old (running) kernel, and perhaps your comments indicate that you have already done exactly that (?). -
What do you get returned from lspci -nn ? In particular, can you find your video hardware there ? If so, what are the details ?
-
When you say that you get no “command line” after picking the new kernel, can you tell me exactly what you are expecting and not getting ? (For me, after picking the new kernel from the grub menu, the next things I get are either a pictorial please-wait screen (if quiet rhgb in the grub command line) or a list of progress reports (otherwise).)
-
What is kmod-stex used for ?
-
(My hunch is that the ruby problem is not the issue, but I don’t really know.)
-
The screenshot of CPU package temperatures being too high might be a hardware problem but I’m pretty sure it’s not the issue currently.
-
I’d put my money on the missing nvidia-drm.ko.xz files. Have you checked whether they are present in /lib/modules/4.18.0-553.5.1.el8_10/… ? (Use find to search all the directories below this, and check that your use of find is working by using to find something you definitely can see you have got.)
-
Alternatively, if the stex module is responsible for reading your disk, it could be something to do with that. When you get to the grub screen at which you choose your kernel, press Escape to give yourself more time, then pick the option for editing the command line, and insert a new line reading
echo ‘About to load linux 553’
just before the line starting linux, then another similar just after that command, then another after the initrd command, and then execute it and see whether these get printed out - and then whether they disappear again or just get stuck with nothing else happening.
I think that’s enough guessing for now - do let me know what you find from the above.
Roger.
Here’s another thought. Did you go to Red Hat 8.6 NVIDIA Optimus and 12th Gen Intel Alder Lake Driver Installation — SPX Labs as suggested by your reference ? I.e. did you add something like GRUB_CMDLINE_LINUX=‘“i915.force_probe=46a8”’ (being very careful with the nested single and double quotes, and where 46a8 is to be replaced by the number after the word Device in the output of
lspci |grep VGA
) to the end of /etc/default/grub ? And then run grub2-mkconfig with the -o option pointing to the correct location of your grub.cfg file in /boot/grub2 ? (which may or may not be the location given in that article - mine is /boot/grub2/grub.cfg ) And then run
dnf install -y module nvidia-driver:latest-dkms
?
Roger.
Hello RFS
Thank you for your interest in my problem.
-
I have installed and uninstalled 553 using
dnf update
without issue, so I figure that all is well.
lspci -nn
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GLM [Quadro P1000 Mobile] [10de:1cbb] (rev a1)
-
Following the grub menu, the screen goes black and stays that way until I crash the machine. The keyboard is lit up but nothing else.
-
kmod-stex is a driver for mounting Promise Pegasus external hard drives. It is not at all essential for the machine.
cd /lib/modules/
(base) [admin@LOWROCKY modules]$ ls
4.18.0-348.el8.0.2.x86_64 4.18.0-477.21.1.el8_8.x86_64 4.18.0-513.11.1.el8_9.0.1.x86_64 4.18.0-513.18.1.el8_9.x86_64 4.18.0-513.5.1.el8_9.x86_64 4.18.0-553.5.1.el8_10.x86_64
4.18.0-477.10.1.el8_8.x86_64 4.18.0-477.27.1.el8_8.x86_64 4.18.0-513.11.1.el8_9.x86_64 4.18.0-513.24.1.el8_9.x86_64 4.18.0-513.9.1.el8_9.x86_64 4.18.0-553.el8_10.x86_64
All the folders above have files in them.
and the result
and it stops there and I eventually crash the machine.
But it seems to go through the grub sequence.
Maybe the issue is with
these BIOS errors I get when I launch 513. I have installed the latest Dell BIOS . From what I have read on the web, other linux OSs have a similar errors but they seem to be unimportant.
Then there is a FAILED to start Load Kernel Modules on 513 that maybe blocking 553
- Let me give you my email address which is myfirstname dot sewell at cantab dot net,
- Might you be able to send me (to my email rather than here) the output of
find /lib/modules -name *nvidia* -print
which might possibly be lengthy ? - Can I check whether you have tried the solution given in my second post to you (starting “Here’s another thought”) ?
- Please could you try point 8 in my first post to you; I understand that stex is not relevant, but another possibility is that there’s also an nvidia driver issue affecting grub somehow, which that might cast light on.
- My similar problem turned out to be a change in the display driver module, which in my case was not anything to do with nvidia but rather the mgag200 module (which I’d guess you won’t have in your system). However, it’s worth mentioning because it turned out that huge changes had been made in this module between versions (there presumably was a good reason, but it does make one think “if it ain’t broke don’t fix it”). I eventually found the bug and got my fix accepted by RedHat. It’s entirely possible that a pile of similar changes were made in the nvidia module for the same reason (whatever that was), and/or that a similar bug has been introduced. You can almost certainly easily download the source of the relevant modules - I can explain how if that’s of interest.
Roger.
@rfs The forum exists for people to post and resolve problems. We recommend that all communication takes place here, and not via email purely for two reasons. First, more people can see the problem and solution. Two, for security reasons, people may or may not wish to give away their email address for it later to potentially be abused.
Even if the output from the modules is lengthy, the forum formatting tools, allow such info to appear in a formatted way that it is easy enough to read. I would say far easier than by email as well.
- Thank you so much for your offer.
find /lib/modules -name *nvidia* -print
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-modeset.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-drm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-uvm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/extra/nvidia-peermem.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-drm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-modeset.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-peermem.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia-uvm.ko.xz
/lib/modules/4.18.0-513.24.1.el8_9.x86_64/weak-updates/nvidia.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-modeset.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-drm.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-uvm.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/extra/nvidia-peermem.ko.xz
/lib/modules/4.18.0-553.el8_10.x86_64/weak-updates/nvidia-peermem.ko.xz
- I have tried it with out much success
GNU nano 2.9.8 /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rl_lowrocky00-swap rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_CMDLINE_LINUX="i915.force_probe=10de:1cbb"
Also, I followed the page and got more errors
sudo dnf install -y module nvidia-driver:latest-dkms
[sudo] password for admin:
Last metadata expiration check: 3:41:40 ago on mer. 12 juin 2024 08:02:42 CEST.
No match for argument: module
No match for argument: nvidia-driver:latest-dkms
Error: Unable to find a match: module nvidia-driver:latest-dkms
- Your point 8 is my point 7 with the pictures (sorry for the confusion).
- I’ve tried following the FAILED kernel error in 513 boot and an error for a paragon NTFS service that I had forgotten about.
systemctl status systemd-modules-load.service
systemd-modules-load.service - Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2024-06-12 11:42:03 CEST; 16min ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Main PID: 1053 (code=exited, status=1/FAILURE)
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: Starting Load Kernel Modules...
juin 12 11:42:03 LOWROCKY.localdomain systemd-modules-load[1053]: Module 'msr' is builtin
juin 12 11:42:03 LOWROCKY.localdomain systemd-modules-load[1053]: Failed to insert 'uapfs': No such file or directory
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
juin 12 11:42:03 LOWROCKY.localdomain systemd[1]: Failed to start Load Kernel Modules.
I have uninstalled the paragon service and yet again I get NVIDIA errors.
(base) [admin@LOWROCKY ufsd_driver_Retail_Professional_lke_10.4.0_b1913]$ sudo ./uninstall.sh
[sudo] password for admin:
Uninstalling driver
depmod: ERROR: fstatat(4, nvidia-drm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-modeset.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-peermem.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia-uvm.ko.xz): No such file or directory
depmod: ERROR: fstatat(4, nvidia.ko.xz): No such file or directory
Driver uninstalled!
Removing automount scripts
Would you like to uninstall HFS/NTFS utilities? [yes/no]
yes
REMOVE /usr/local/bin /usr/bin
HFS/NTFS utilities uninstalled!
But after a reboot, I still get a black screen.
Could there be something wrong with this GRUB command?
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rl_lowrocky00-swap rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau"
@iwalker: Thank you for your policy statement. For me, I cannot access the forums on my main computer at all, I think because of local network security issues. I entirely agree that solutions should be posted, but am not convinced that the process of getting there via numerous misguesses is that helpful to others. While I’m happy to try to help, being only able to do so via forum messages is for me sufficiently inconvenient that there’s a much lower limit on how much effort I’m prepared to put in via that route. Hence my suggestion - which people are of course free to decline.
Having looked at what you’ve just posted, I have the following observations:
-
In your old setup you have (I think) the nvidia drivers overridden by ones that have been loaded from elsewhere, probably from kmod-nvidia… from elrepo (?). You could check that by running
yum list installed kmod-nvidia\*
under your old system. In the new system they are maybe not so overridden, and my guess is that you will not have the latest kmod-nvidia package installed for the new system (which is available from elrepo). You can check what is or isn’t overridden by looking in /etc/depmod.d . A possible reason for the complaints about missing nvidia modules is that depmod.d/nvidia…conf contains a command to override from weak-updates instead of from extra. -
There may indeed be something missing from your grub kernel command line, I’m not sure. My current grub kernel command line also contains the phrases rd.lvm.lv=vg_revelationr9/root rd.lvm.lv=vg_revelationr9/swap (where vg_revelationr9 is the name of the LVM2 volume group my root partition sits in). Have you compared your new command line with the old one from which your working system runs ?
Roger.
Thank you Roger for your thoughts
sudo yum list installed kmod-nvidia\*
[sudo] password for admin:
Installed Packages
kmod-nvidia-latest-dkms.x86_64 3:555.42.02-1.el8 @cuda-rhel8-x86_64
I don’t have a nvidia file in the depmod.d folder
[root@LOWROCKY depmod.d]# ls
dist.conf kmod-stex.conf
NVIDIA driver management is probably the only reason why I encourage my colleagues to move to linux. Having 2 code sources, it adds so much confusion. I still don’t understand why there is elrepo on top of the official NVIDIA driver. I think I have had had more success with the NVIDIA driver than elrepo, but I can no longer be sure. What I do know is that when I choose the wrong one, I waste a day making the machine work properly.
Maybe I should just uninstall all NVIDIA and reinstall from one source (why am I writing as I really don’t want to do it) ?
I think that if I were in your position I would consider the following steps:
-
Check that the actual command line you are running has something on the lines of root=/dev/mapper/vg_revelationr9-root ro
in it (I think it probably has, as I’ve tried removing this from mine, and I get a different set of symptoms from what you have). -
Try adding a file to /etc/depmod.d called nvidia.conf containing
override nvidia * weak-updates/nvidia
then run
dracut 4.18.0-553.el8_10
which should remake the relevant initramfs file in /boot
then reboot and see what happens. So long as you give the correct kernel version number that should leave your existing system untouched, but if worried then make a copy of the initramfs file for your working system first. -
If that doesn’t work I’d repeat the above with weak-updates replaced by extra .
-
If that doesn’t work, I would install kmod-nvidia-550.90.07-1.el8_10.elrepo.x86_64.rpm from elrepo , then reboot and see what happens. Warning: it’s possible that doing that might make your new system work but stop your old system working - I don’t know.
But in any event, I’m not an expert here. It goes without saying that you need a copy of your whole system before trying such things !
Good luck,
Roger.