Nvidia driver install Rocky 8 - how to?

Hi
I followed the Nvidia tutorial here
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Then removed everything again after it did not work. Then I did this:

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
sudo dnf install nvidia-driver nvidia-settings
sudo dnf install cuda-driver
#  Reboot

But nvidia-smi shows this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

How can I fix this?

Hello @catbears , take a peak here and let us know if it helps.

"nvidia drivers

It is highly recommended to avoid using nvidia’s “official” driver and use either rpmfusion 12 or elrepo 32. This is due to how they package the driver and make it near impossible to run a newer kernel without waiting for a new package.

Note that elrepo only supports the nvidia driver for EL8."

Ok great. I went here and they say to do this

sudo dnf clean all
sudo dnf module disable nvidia-driver
sudo dnf -y install cuda```

The system fails then at installing cuda
```$ sudo dnf -y install cuda
Error: 
 Problem: package cuda-12-1-12.1.1-1.x86_64 requires cuda-runtime-12-1 >= 12.1.1, but none of the providers can be installed
  - package cuda-12.1.1-1.x86_64 requires cuda-12-1 >= 12.1.1, but none of the providers can be installed
  - package cuda-runtime-12-1-12.1.1-1.x86_64 requires cuda-drivers >= 530.30.02, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package cuda-drivers-530.30.02-1.x86_64 is filtered out by modular filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

Looking at this site: ELRepo | kmod-nvidia it’s from 2015 and does not work or I cannot get it to work. And then there is the issue, that I won’t use the Tesla K80 for X but rather to calculate. Is there any other documentation out there?

Can you share the details of the video card, model and version.

$ lspci | grep K80
86:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
87:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

Not so out of place with ChatGPT

“Well, this is where ShellGPT steps into the picture. It brings the power of ChatGPT right to your Linux Terminal. ShellGPT provides answers, intelligent suggestions, and writes commands and code for you, helping you navigate the command line efficiently. In this article, we will show you how to install the ShellGPT command line tool and use a ChatGPT-like AI bot in the Linux command line.”

If you do go to NVidia’s page to search drivers, it will tell that Kepler generation Tesla’s are supported by driver version 460. That is by trivial search. Bit more and 470 hopefully does support the K80 too. The “latest” driver on NVidia’s search seems to be 535, but in their repo 530.

One way to check that is to see what ELRepo says with:

dnf install elrepo-release
dnf install nvidia-detect
nvidia-detect

The last command shows which driver version from ELRepo ought to support the card. The result is more likely kmod-nvidia-470xx than kmod-nvidia.


The CUDA does require NVidia’s graphics drivers for talking with the card. The CUDA runtime used to be very strict about the versions, but has lately given more leeway. Nevertheless, it is probably better to use NVidia’s packages than ELRepo version. So, if we do not need the ELRepo repository, then at least disable or remove definition:

dnf config-manager --disable elrepo   # just disable, hide, repo -- dnf will not use that repo
dnf remove elrepo-release             # totally remove the definition -- dnf does not know about ELrepo

The question is, which versions does the NVidia’s repo offer?

dnf module list nvidia-driver

(I don’t have el8 to test on.)

Funny thing, I was taking over the ChatGPT conversation. Like the modern ‘let me google that for you’ :smile:

dnf module list nvidia-driver gives back this list

cuda-rhel8-x86_64
Name                       Stream                       Profiles                                 Summary                                         
nvidia-driver              latest                       default [d], fm, ks, src                 Nvidia driver for latest branch                 
nvidia-driver              latest-dkms [d]              default [d], fm, ks                      Nvidia driver for latest-dkms branch            
nvidia-driver              open-dkms                    default [d], fm, ks, src                 Nvidia driver for open-dkms branch              
nvidia-driver              418                          default [d], fm, ks, src                 Nvidia driver for 418 branch                    
nvidia-driver              418-dkms                     default [d], fm, ks                      Nvidia driver for 418-dkms branch               
nvidia-driver              440                          default [d], fm, ks, src                 Nvidia driver for 440 branch                    
nvidia-driver              440-dkms                     default [d], fm, ks                      Nvidia driver for 440-dkms branch               
nvidia-driver              450                          default [d], fm, ks, src                 Nvidia driver for 450 branch                    
nvidia-driver              450-dkms                     default [d], fm, ks                      Nvidia driver for 450-dkms branch               
nvidia-driver              455                          default [d], fm, ks, src                 Nvidia driver for 455 branch                    
nvidia-driver              455-dkms                     default [d], fm, ks                      Nvidia driver for 455-dkms branch               
nvidia-driver              460                          default [d], fm, ks, src                 Nvidia driver for 460 branch                    
nvidia-driver              460-dkms                     default [d], fm, ks                      Nvidia driver for 460-dkms branch               
nvidia-driver              465                          default [d], fm, ks, src                 Nvidia driver for 465 branch                    
nvidia-driver              465-dkms                     default [d], fm, ks                      Nvidia driver for 465-dkms branch               
nvidia-driver              470                          default [d], fm, ks, src                 Nvidia driver for 470 branch                    
nvidia-driver              470-dkms                     default [d], fm, ks                      Nvidia driver for 470-dkms branch               
nvidia-driver              495                          default [d], fm, ks, src                 Nvidia driver for 495 branch                    
nvidia-driver              495-dkms                     default [d], fm, ks                      Nvidia driver for 495-dkms branch               
nvidia-driver              510                          default [d], fm, ks, src                 Nvidia driver for 510 branch                    
nvidia-driver              510-dkms                     default [d], fm, ks                      Nvidia driver for 510-dkms branch               
nvidia-driver              515                          default [d], fm, ks, src                 Nvidia driver for 515 branch                    
nvidia-driver              515-dkms                     default [d], fm, ks                      Nvidia driver for 515-dkms branch               
nvidia-driver              515-open                     default [d], fm, ks, src                 Nvidia driver for 515-open branch               
nvidia-driver              520                          default [d], fm, ks, src                 Nvidia driver for 520 branch                    
nvidia-driver              520-dkms                     default [d], fm, ks                      Nvidia driver for 520-dkms branch               
nvidia-driver              520-open                     default [d], fm, ks, src                 Nvidia driver for 520-open branch               
nvidia-driver              525                          default [d], fm, ks, src                 Nvidia driver for 525 branch                    
nvidia-driver              525-dkms                     default [d], fm, ks                      Nvidia driver for 525-dkms branch               
nvidia-driver              525-open                     default [d], fm, ks, src                 Nvidia driver for 525-open branch               
nvidia-driver              530 [e]                      default [d] [i], fm, ks, src             Nvidia driver for 530 branch                    
nvidia-driver              530-dkms                     default [d], fm, ks                      Nvidia driver for 530-dkms branch               
nvidia-driver              530-open                     default [d], fm, ks, src                 Nvidia driver for 530-open branch               

Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled

It seems, that the 530 is installed from my efforts, but then there is

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Edit:

$ nvidia-detect
kmod-nvidia-470xx
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version

So I don’t know. Should I change the installed/enabled to 460, should I take it from the website with the ChatGPT solution with the version Cphusion shows?

It doesn’t look like you enabled the 460 driver. Can you try resetting the driver and then enabling it.

dnf module reset nvidia-driver
dnf module enable nvidia-driver:460
dnf module install nvidia-driver:460

Then reboot and run nvidia-smi again?

The ELRepo’s utility thinks that their 470 legacy driver supports your card.

It is clear that the “latest” 530 does not. Yes, you can install the packages, but the kernel module fails to load due to “unknown” card.
You probably can modprobe nvidia and peek dmesg -T and see error messages.
Furthermore, you get nothing with lsmod | grep nvidia


I would:

dnf module reset nvidia-driver
dnf module install nvidia-driver:470
dnf distro-sync

That should downgrade the driver packages to version 470. You should read RHEL docs about switch stream, for what I wrote above might not be the “pure” way to do it.


If the 470 version of kernel module fails with similar error messages as the 530, then switch to the 460 stream.


You can’t use the latest CUDA version either, because it does not support your card any more. Pick something older. (I trust you know CUDA better.)