TUTORIAL for NVIDIA GPU

Here is a no nonsense tutorial on how to get nvidia gpu working with Rocky Linux desktop! It works great on nvidia rtx 2080 and others.

  1. sudo dnf update && sudo dnf upgrade -y

  2. sudo dnf install epel-release

  3. sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo

  4. sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

  5. sudo dnf install nvidia-driver nvidia-settings

  6. sudo dnf install cuda-driver

  7. sudo reboot now

3 Likes

Nice, but there is an another version (particularly if one does not need CUDA):

  1. sudo dnf update
  2. sudo reboot
  3. sudo dnf install elrepo-release
  4. sudo dnf install nvidia-detect
  5. sudo dnf install $(nvidia-detect)
  6. sudo reboot

Note though that you definitely do not want both ELRepo and NVidiaā€™s repo simultaneously enabled.

1 Like

cool, thank you!! I will re-install in a day or so and report back here on how the newer version worked out. I take it you are saying I should use your version? I do not even know what cuda is for so i dont need cuda one would assume?

CUDA is the NVIDIA framework for GPU parallel programming.

Ivan

Hi. After performing the above steps and rebooting, I am still running with nouveau. Is there another step to get the NVIDIA drivers working?

We have listed to mutually exclusive sets of steps. Which ones did you use?

I used the first method listed. To confirm, Iā€™ve installed Nvidia 495.29.05 and confirmed it with nvidia-settings. The problem is that according to ā€œsudo lshw -c videoā€ the driver=nouveau. How can I tell the OS to use the Nvidia? Rebooting after the procedure does not change it.

I donā€™t know how the content from NVidiaā€™s own repo installs.

  • The ELrepo packages apped nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev to kernel command-line options
  • They also add
Section "Device"
	Identifier  "Videocard0"
	Driver      "nvidia"
EndSection

into X11 configuration. Furthermore, they did disable Wayland too. (Not sure if they still do.)

OK now. To recap: I had two issues. First, in order to disengage nouveau, I had to get back to the Rocky/Fedora boot menu which was configured secondary to Manjaroā€™s. (Itā€™s seems strange to me that nouveau can only be blacklisted in the grub bootmenu, not in the OS itself.) Once I did that I finally got to the install with the nvidia driver. However, the nvidia install I used from the 1st post here wasnā€™t working properly as the screen was stuck at a stretched 1024x768 res. I needed to restart the procedure so I removed the nvidia driver with:

`sudo dnf remove ā€˜nvidiaā€™

after which I was back with nouveau. From there I ran jlehtoneā€™s steps in post 2. That went smoothly and I now seem to have a correctly configured desktop with nvidia drivers. In my setup I just have a couple extra steps using the efi setting boot menu.

1 Like

There are some important information here if you want to use NVIDIA GPU within a virtual machine :

'cuda are fish that swim in da sea.

Before performing step 5, I checked ā€œnvidia-detect -vā€, which gives the following:

nvidia-detect -v
Probing for supported NVIDIA devicesā€¦
[8086:3e9b] Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
[10de:1cbc] NVIDIA Corporation GP107GLM [Quadro P600 Mobile]
This device requires the current 525.85.05 NVIDIA driver
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version
An Intel display controller was also detected

So what to make of that? Am I stuck with nouveau drivers, or should I be able to install and use 525.85.05 Nvidia drivers ok?

HUMMMMMM

OK Here is what I got:

Downloading Packages:
nvidia-detect-525.85.05-1.el9.x86_64.rpm 48 kB/s | 24 kB 00:00

Total 34 kB/s | 24 kB 00:00
ELRepo.org Community Enterprise Linux Repository - el9 1.6 MB/s | 1.7 kB 00:00
Importing GPG key 0xBAADAE52:
Userid : ā€œelrepo.org (RPM Signing Key for elrepo.org) secure@elrepo.orgā€
Fingerprint: 96C0 104F 6315 4731 1E0B B1AE 309B C305 BAAD AE52
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo.org
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : nvidia-detect-525.85.05-1.el9.x86_64 1/1
Running scriptlet: nvidia-detect-525.85.05-1.el9.x86_64 1/1
Verifying : nvidia-detect-525.85.05-1.el9.x86_64 1/1

Installed:
nvidia-detect-525.85.05-1.el9.x86_64

Complete!


root@ocelot/home/dcat nvidia-detect -v
Probing for supported NVIDIA devicesā€¦
[10de:1c82] NVIDIA Corporation GP107 [GeForce GTX 1050 Ti]
This device requires the current 525.85.05 NVIDIA driver
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version


The last time I tried to install the NVidia drivers it blew up in my face. Not eager to repeat the experience.

Dā€™ Cat

It seems that the nvidia-detect reads the version of X11 from the /var/log/Xorg.0.log
Your system does not have the /var/log/Xorg.0.log yet ā€“ you probably have not started X11. Not even once.

The ELRepo driver is built for EL8 and does support the Xorg that is provided by EL8.
Therefore, the ā€œdoes not supportā€ warning message is false.

Team!
Iā€™ve been trawling this (fantastic) forum - Trying to get Rocky Linux 9.3 and 9.4 working with my NVIDIA A100 GPUā€¦ No luck.

Error:

[ 8.722326] nvidia-nvlink: Nvlink Core is being initialized, major device num ber 239
[ 8.722333] NVRM: The NVIDIA GPU 0000:02:00.0 (PCI ID: 10de:20b0)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 555.42.02 driver release.
NVRM: Please see ā€˜Appendix A - Supported NVIDIA GPU Productsā€™
NVRM: in this releaseā€™s README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.

Any hints as to the issue above? (NVIDIA docs suggests otherwise on the compatibility.)

There is no ā€œA100ā€ on the list: Linux x64 (AMD64/EM64T) Display Driver | 555.42.02 | Linux 64-bit | NVIDIA
But they do list for ā€œData Center Driverā€: Data Center Driver for Linux RHEL 9 | 550.90.07 | Linux 64-bit RHEL 9 | NVIDIA

The latter seems to download an RPM?

Thanks very much for the reply. Appreciated. Well spotted. Iā€™d incorrectly assumed that the A100 GPU was supported in the repo versions. I canā€™t get that one to install on Rocky 9.3 or 4 so it looks like Iā€™ll have to persist trying that path on a clean OS build.

Hi Guys - Worked it out. Turns out that the notes arenā€™t clear at all on what supports whatā€¦
The A100 needs the Enterprise AEI flavour of drivers (create an NVIDIA Enterprise Trial login to download etc). Then you can download from their portal a special RHEL9 driver (RPM) and that just works first time on RL9.4 - No fussing around at all!

Hope that helps someone with an A100 GPU get their sanity restored.

1 Like

This did not work for me. I still get the following error when executing nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

This is what I get when I run nvidia-detect -v

[telahee@localhost ~]$ nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1ff2] NVIDIA Corporation TU117GL [T400 4GB]
This device requires the current 550.107.02 NVIDIA driver kmod-nvidia

I also checked lsmod

[telahee@localhost ~]$ lsmod|grep -i nouveau
[telahee@localhost ~]$ lsmod|grep -i nvidia

And I see nothing. Iā€™m not sure if the driver is not installed properly or if it is misconfigured and so is not being loaded.