TUTORIAL for NVIDIA GPU

Here is a no nonsense tutorial on how to get nvidia gpu working with Rocky Linux desktop! It works great on nvidia rtx 2080 and others.

  1. sudo dnf update && sudo dnf upgrade -y

  2. sudo dnf install epel-release

  3. sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo

  4. sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

  5. sudo dnf install nvidia-driver nvidia-settings

  6. sudo dnf install cuda-driver

  7. sudo reboot now

4 Likes

Nice, but there is an another version (particularly if one does not need CUDA):

  1. sudo dnf update
  2. sudo reboot
  3. sudo dnf install elrepo-release
  4. sudo dnf install nvidia-detect
  5. sudo dnf install $(nvidia-detect)
  6. sudo reboot

Note though that you definitely do not want both ELRepo and NVidia’s repo simultaneously enabled.

1 Like

cool, thank you!! I will re-install in a day or so and report back here on how the newer version worked out. I take it you are saying I should use your version? I do not even know what cuda is for so i dont need cuda one would assume?

CUDA is the NVIDIA framework for GPU parallel programming.

Ivan

Hi. After performing the above steps and rebooting, I am still running with nouveau. Is there another step to get the NVIDIA drivers working?

We have listed to mutually exclusive sets of steps. Which ones did you use?

I used the first method listed. To confirm, I’ve installed Nvidia 495.29.05 and confirmed it with nvidia-settings. The problem is that according to ā€œsudo lshw -c videoā€ the driver=nouveau. How can I tell the OS to use the Nvidia? Rebooting after the procedure does not change it.

I don’t know how the content from NVidia’s own repo installs.

  • The ELrepo packages apped nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev to kernel command-line options
  • They also add
Section "Device"
	Identifier  "Videocard0"
	Driver      "nvidia"
EndSection

into X11 configuration. Furthermore, they did disable Wayland too. (Not sure if they still do.)

OK now. To recap: I had two issues. First, in order to disengage nouveau, I had to get back to the Rocky/Fedora boot menu which was configured secondary to Manjaro’s. (It’s seems strange to me that nouveau can only be blacklisted in the grub bootmenu, not in the OS itself.) Once I did that I finally got to the install with the nvidia driver. However, the nvidia install I used from the 1st post here wasn’t working properly as the screen was stuck at a stretched 1024x768 res. I needed to restart the procedure so I removed the nvidia driver with:

`sudo dnf remove ā€˜nvidia’

after which I was back with nouveau. From there I ran jlehtone’s steps in post 2. That went smoothly and I now seem to have a correctly configured desktop with nvidia drivers. In my setup I just have a couple extra steps using the efi setting boot menu.

1 Like

There are some important information here if you want to use NVIDIA GPU within a virtual machine :

'cuda are fish that swim in da sea.

Before performing step 5, I checked ā€œnvidia-detect -vā€, which gives the following:

nvidia-detect -v
Probing for supported NVIDIA devices…
[8086:3e9b] Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
[10de:1cbc] NVIDIA Corporation GP107GLM [Quadro P600 Mobile]
This device requires the current 525.85.05 NVIDIA driver
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version
An Intel display controller was also detected

So what to make of that? Am I stuck with nouveau drivers, or should I be able to install and use 525.85.05 Nvidia drivers ok?

HUMMMMMM

OK Here is what I got:

Downloading Packages:
nvidia-detect-525.85.05-1.el9.x86_64.rpm 48 kB/s | 24 kB 00:00

Total 34 kB/s | 24 kB 00:00
ELRepo.org Community Enterprise Linux Repository - el9 1.6 MB/s | 1.7 kB 00:00
Importing GPG key 0xBAADAE52:
Userid : ā€œelrepo.org (RPM Signing Key for elrepo.org) secure@elrepo.orgā€
Fingerprint: 96C0 104F 6315 4731 1E0B B1AE 309B C305 BAAD AE52
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo.org
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : nvidia-detect-525.85.05-1.el9.x86_64 1/1
Running scriptlet: nvidia-detect-525.85.05-1.el9.x86_64 1/1
Verifying : nvidia-detect-525.85.05-1.el9.x86_64 1/1

Installed:
nvidia-detect-525.85.05-1.el9.x86_64

Complete!


root@ocelot/home/dcat nvidia-detect -v
Probing for supported NVIDIA devices…
[10de:1c82] NVIDIA Corporation GP107 [GeForce GTX 1050 Ti]
This device requires the current 525.85.05 NVIDIA driver
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version


The last time I tried to install the NVidia drivers it blew up in my face. Not eager to repeat the experience.

D’ Cat

It seems that the nvidia-detect reads the version of X11 from the /var/log/Xorg.0.log
Your system does not have the /var/log/Xorg.0.log yet – you probably have not started X11. Not even once.

The ELRepo driver is built for EL8 and does support the Xorg that is provided by EL8.
Therefore, the ā€œdoes not supportā€ warning message is false.

Team!
I’ve been trawling this (fantastic) forum - Trying to get Rocky Linux 9.3 and 9.4 working with my NVIDIA A100 GPU… No luck.

Error:

[ 8.722326] nvidia-nvlink: Nvlink Core is being initialized, major device num ber 239
[ 8.722333] NVRM: The NVIDIA GPU 0000:02:00.0 (PCI ID: 10de:20b0)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 555.42.02 driver release.
NVRM: Please see ā€˜Appendix A - Supported NVIDIA GPU Products’
NVRM: in this release’s README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.

Any hints as to the issue above? (NVIDIA docs suggests otherwise on the compatibility.)

There is no ā€œA100ā€ on the list: Linux x64 (AMD64/EM64T) Display Driver | 555.42.02 | Linux 64-bit | NVIDIA
But they do list for ā€œData Center Driverā€: Data Center Driver for Linux RHEL 9 | 550.90.07 | Linux 64-bit RHEL 9 | NVIDIA

The latter seems to download an RPM?

Thanks very much for the reply. Appreciated. Well spotted. I’d incorrectly assumed that the A100 GPU was supported in the repo versions. I can’t get that one to install on Rocky 9.3 or 4 so it looks like I’ll have to persist trying that path on a clean OS build.

Hi Guys - Worked it out. Turns out that the notes aren’t clear at all on what supports what…
The A100 needs the Enterprise AEI flavour of drivers (create an NVIDIA Enterprise Trial login to download etc). Then you can download from their portal a special RHEL9 driver (RPM) and that just works first time on RL9.4 - No fussing around at all!

Hope that helps someone with an A100 GPU get their sanity restored.

1 Like

This did not work for me. I still get the following error when executing nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

This is what I get when I run nvidia-detect -v

[telahee@localhost ~]$ nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1ff2] NVIDIA Corporation TU117GL [T400 4GB]
This device requires the current 550.107.02 NVIDIA driver kmod-nvidia

I also checked lsmod

[telahee@localhost ~]$ lsmod|grep -i nouveau
[telahee@localhost ~]$ lsmod|grep -i nvidia

And I see nothing. I’m not sure if the driver is not installed properly or if it is misconfigured and so is not being loaded.