How to Install NVIDIA Driver and CUDA on Rocky 9.1

I’m new to Rocky and I want to install nvidia driver and cuda on my newly installed rocky 9.1. After ONE-DAY-LONGED tried times, I still fail to have them installed on my OS, CAN SOMEONE HELP ME with this, here is the basic info:

  1. Hardware:

(1)cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c

20 12th Gen Intel(R) Core™ i9-12900H

(2)sudo lshw -numeric -C display

[sudo]
*-display
description: 3D controller
product: GA107GLM [RTX A2000 8GB Laptop GPU] [10DE:25BA]
vendor: NVIDIA Corporation [10DE]
physical id: 0
bus info: pci@0000:01:00.0
logical name: /dev/fb0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list fb
configuration: depth=32 latency=0 mode=1920x1080 visual=truecolor xres=1920 yres=1080

There’re many ways I tried, Including disable nouveau and so on, at end I still failed. Finally I followed the step of the 1st solution in the blog https://forums.rockylinux.org/t/tutorial-for-nvidia-gpu/4234 . I successfully exec all the command:

  1. sudo dnf update && sudo dnf upgrade -y
  2. sudo dnf install epel-release
  3. sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
  4. sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
  5. sudo dnf install nvidia-driver nvidia-settings
  6. sudo dnf install cuda-driver
  7. sudo reboot now

but when I exec the command to check nvidia driver, I still couldn’t exec it successfuly:

What Can I do to get it successfully installed, thanks so much

1 Like

The NVidia’s repo has “dnf modules” (aka “streams”).
See

dnf module list nvidia-driver

Perhaps all necessary packages were not included?
It is possible to install a set of packages (that make a stream) with:

dnf module install stream_spec

where stream_spec is something like nvidia-driver:latest-dkms


Another possibility is that the install did not generate the kernel module.


Third is SecureBoot. Check its status and modprobe messages with:

mokutil --sb
dmesg | grep nvidia

If you do have SecureBoot on and did not load dkms.mok signature to UEFI (with mokutil), then the untrusted kernel module fails to load.

1 Like

I do enable the secure boot with no output after executing dmesg | grep nvidia, how to load dkms.mok signature to UEFI (with mokutil) ?

First, the the mokutil --sb merely shows whether SecureBoot is enabled or not.
One has to enable/disable SecureBoot in the UEFI.

The dmesg | grep nvidia should show something even when module load fails due to lacking certificate.
One might look with dmesg | grep -i nvidia too, since some messages might use uppercase.

Lack of any messages points to total lack of kernel module. The modinfo nvidia could tell something even when module exists, but cannot be loaded.

The dkms should put the modules into /lib/modules/5.14.0-*/extra/. Do you have anything in any of those?

Let me know in this thread if you find a solution, as I seem to be fighting with the exact same problem.

I installed the drivers according to instructions (not sure if it were exactly the same steps as you, but looks somewhat familiar), no errors detected, but nvidia-smi gives similar output after a reboot.

I wouldn’t have guessed it might have something to do with e.g. secure boot, which I have enabled because I have also Windows 11 running on this laptop (I think Win11 requires it by default). I suspected the problem might be this laptop having both Intel HD and NVidia GPUs, and that somehow messing up the installation. E.g. lspci | grep -e VGA listed the Intel HD graphics chipset for me, not the Geforce GPU.

I think I used these instructions:

There also seems to be some discussion on this forum:

If you have Secure Boot enabled I don’t think the driver from Nvidia’s official CUDA repo will work out of the box. You may have to manually install the downloaded .run file from Nvidia driver website. Make sure you generate something called a “key pair” when the installer asks you (the first time you install the driver), and then use mokutil to sign it manually after installation.
I followed an installation guide to install nvidia driver for Fedora 37/38 which I think also works for RHEL I’ll try to find it if you need

1 Like

It will. The dkms – when it runs – generates the “key pair” (aka certificate) automatically. Use mokutil to sign it manually when that has happened.

I disabled the secure boot on bios config it worked Finally.

1 Like

Hi guys, I finally solved it by disable the secure boot on bios, here’s the steps:

  1. check the driver version your laptop need and download the .run file from the official website;
  2. disable the the secure boot on bios;
  3. disable the nouveau;
  4. go into tty and run the .run file using sudo cmd,
    –no-x-check params when running the command
    i.e
./NVIDIA-Linux-x86_64-515.48.07.run --no-x-check
  1. reboot and check whether installed sucessfully: nvidia-smi
1 Like

Do you have the installation guide?
I’d like to have more precise commands to follow, as I have no idea how to e.g. sign the keypair manually with mokutil (?).

I am generally hesitant to test out different things with secure boot as I have some times ended up with unbootable OS installations when I have played with the secure boot settings on my own, without really understanding fully what I am doing.

I actually found instructions that seem to be working, without having to disable Secure Boot:

https://www.reddit.com/r/RockyLinux/comments/woqgij/rocky_9_nvidia_drivers_secure_boot_success/

I changed the instructions a bit, e.g. the non-working “sudo echo >” commands in the original instructions:

sudo dnf install epel-release 
sudo dnf upgrade
sudo reboot

After the reboot:

sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo
sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms
sudo dnf module install nvidia-driver:latest-dkms
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo 'omit_drivers+=" nouveau "' | sudo tee /etc/dracut.conf.d/blacklist-nouveau.conf
sudo dracut --regenerate-all --force
sudo depmod -a

The mokutil command will prompt for you to make up a password, which will be used during reboot

sudo mokutil --import /var/lib/dkms/mok.pub
sudo reboot

After the reboot, your system should ask you if you want to enroll a key or something like that, say yes and it asks for the password you gave in the mokutil command.

Later on, check that the nvidia drivers are installed:

nvidia-smi

If you dual boot windows, a windows-driven firmware update may reset the TPM chip (?). Anyway, any time this happens you’ll need to re-run ‘mokutil --import /var/lib/dkms/mok.pub’

3 Likes

I’m glad that my instructions were helpful! It was a battle getting them to work – but going on a year of no problems!

Sorry about the broken sudo commands, I actually ran it all as root and just changed it to sudo without testing. :frowning:

This seems to be what I should have done: permissions - How to insert text into a root-owned file using sudo? - Unix & Linux Stack Exchange

2 Likes