Nvidia Drivers install fails on Rocky Linux 9, "NVIDIA-SMI has failed..."

I’m getting the following error message when I run the command nvidia-smi :
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I’ve followed the instructions from this link

Everything seemed to work except the nvidia-smi command.

Thank you in advance :100:

1 Like

Hi Sumasri,

Do you have secure boot enabled? that might be causing the issue.

Could you share what Nvidia GPU you are using?

Is this a fresh install that you installed the drivers on or did you have previously other drivers installed on it?

1 Like

Thank you, disabling the secure boot fixed the problem.

1 Like

Does anyone have a solution to this that doesn’t involve disabling secure boot? Not likely to go down too well at work.

I have Nvidia’s drivers from NVidia’s own yum repository (the name starts “cuda-”).
Just today, I updated kernel and ‘dnf history’ shows messages.
Looks like the ‘dkms’ did compile (nvidia) kernel modules for the new kernel and
there were mention of “signed” and a path to *.mok file. I do presume that mokutil
could load that certificate to UEFI and then loading of those kernel modules would
succeed also when secure boot is enforcing.

Do you have a link to that repo and instructions on how to use it. Would I have to make sure mokutil is installed or is it already there? I’ve only installed the minimal version of Rocky,
Thanks

NVidia’s docs seem to be in: cuda-installation-guide-linux 12.1 documentation

Note: One should not use rpm to install from local RPM-files; the dnf can install those just fine.
A lot of “guides” haven’t catched up with that fact.

The mokutil is not member of any “dnf group”:

el9:2:~$ which mokutil
/usr/bin/mokutil
el9:2:~$ rpm -qf /usr/bin/mokutil
mokutil-0.4.0-9.el9.x86_64
el9:2:~$ dnf -q rq --groupmember mokutil
mokutil-2:0.4.0-9.el9.x86_64

For comparison, the sshd is in two groups:

el9:2:~$ dnf -q rq --groupmember openssh-server
openssh-server-8.7p1-24.el9_1.x86_64
  @core
  @headless-management

I do presume that the installer does include the mokutil (and other EFI-related tools) when you do an EFI installation – even the Minimal Install. Besides, a GUI desktop is not in that either …

Sorry, I’ve never used mokutil, it’s installed on my system. I tried following the instructions given in the nvidia link, and they completed successfully, but on reboot I still get the same problem.

I’d boot to multi-user.target with secure boot on, log into console, and run:

modprobe nvidia

Then look at the output of

dmesg

Since secure boot supposedly prevents loading the ‘nvidia’ module, the error messages are of interest.

Hello -

I have been using nvidia drivers successfully with the standard rhel package. Today, when I upgraded the kernel due to system update, the nvidia modules would not build at boot and I had to go back to the old kernel.

Where are the commands to rebuild the module for a newly installed kernel? I do not know whether the new kernel came with headers, etc… something is wrong.

Sorry not an expert. This is my SMI (went back to old kernel):

Sun Apr  2 14:32:43 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01    Driver Version: 525.78.01    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:02:00.0  On |                  N/A |
| 51%   65C    P0     3W /  39W |    842MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11541      G   /usr/libexec/Xorg                 315MiB |
|    0   N/A  N/A     11708      G   /usr/bin/gnome-shell              311MiB |
|    0   N/A  N/A     12484      G   nvidia-settings                     0MiB |
|    0   N/A  N/A     12513      G   /usr/lib64/firefox/firefox        207MiB |
+-----------------------------------------------------------------------------+

uname -a
Linux XXXXX 5.14.0-162.18.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 1 22:02:24 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux