NVIDIA GPU drivers on headless RL server

Hello guys!
I’m trying setup an headless RL server for computation workloads.
The server is equipped with an NVIDIA GPU and it runs RL 8.4 without graphical desktop environment.
I’m trying to install the NVIDIA GPU dirvers to enable CUDA support without installing the graphical desktop environment too.
How can I do it?
Thanks for help.

Ivan

NVidia has Yum-repositories that provide CUDA toolkit and drivers properly RPM-packaged: Installation Guide Linux :: CUDA Toolkit Documentation

PS. There is no “RL 8.4”. There is “RL 8” and it has currently content based on RHEL 8.5.
If you still have content from 8.4 release, then do sudo dnf up ASAP.

Thanks for response.
As you can see, following the procedure suggested by NVIDIA (already done many times) there’s a lot of graphical dependencies (e.g., xorg server) to be installed.

# dnf module install nvidia-driver:latest-dkms
Last metadata expiration check: 0:00:51 ago on Sat 19 Feb 2022 09:38:40 AM EST.
Dependencies resolved.
=============================================================================================================================================================
 Package                                  Architecture          Version                                               Repository                        Size
=============================================================================================================================================================
Installing group/module packages:
 cuda-drivers                             x86_64                510.47.03-1                                           cuda-rhel8-x86_64                7.0 k
 kmod-nvidia-latest-dkms                  x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 28 M
 nvidia-driver                            x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 22 M
 nvidia-driver-NVML                       x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                516 k
 nvidia-driver-NvFBCOpenGL                x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 52 k
 nvidia-driver-cuda                       x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                591 k
 nvidia-driver-cuda-libs                  x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 63 M
 nvidia-driver-devel                      x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 12 k
 nvidia-driver-libs                       x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                168 M
 nvidia-kmod-common                       noarch                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 12 k
 nvidia-libXNVCtrl                        x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 25 k
 nvidia-libXNVCtrl-devel                  x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 55 k
 nvidia-modprobe                          x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 36 k
 nvidia-persistenced                      x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                 42 k
 nvidia-settings                          x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                832 k
 nvidia-xconfig                           x86_64                3:510.47.03-1.el8                                     cuda-rhel8-x86_64                105 k
Installing dependencies:
 adwaita-cursor-theme                     noarch                3.28.0-2.el8                                          appstream                        646 k
 adwaita-icon-theme                       noarch                3.28.0-2.el8                                          appstream                         11 M
 at-spi2-atk                              x86_64                2.26.2-1.el8                                          appstream                         88 k
 at-spi2-core                             x86_64                2.28.0-1.el8                                          appstream                        168 k
 colord-libs                              x86_64                1.4.2-1.el8                                           appstream                        234 k
 dkms                                     noarch                3.0.3-1.el8                                           epel                              62 k
 dnf-plugin-nvidia                        noarch                2.0-1.el8                                             cuda-rhel8-x86_64                 12 k
 egl-wayland                              x86_64                1.1.7-1.el8                                           appstream                         33 k
 gtk3                                     x86_64                3.22.30-8.el8                                         appstream                        4.5 M
 libX11-devel                             x86_64                1.6.8-5.el8                                           appstream                        975 k
 libX11-xcb                               x86_64                1.6.8-5.el8                                           appstream                         13 k
 libXau-devel                             x86_64                1.0.9-3.el8                                           appstream                         19 k
 libXdmcp                                 x86_64                1.1.3-1.el8                                           appstream                         40 k
 libXfont2                                x86_64                2.0.3-2.el8                                           appstream                        147 k
 libXtst                                  x86_64                1.2.3-7.el8                                           appstream                         21 k
 libdrm                                   x86_64                2.4.106-2.el8                                         appstream                        166 k
 libepoxy                                 x86_64                1.5.8-1.el8                                           appstream                        224 k
 libevdev                                 x86_64                1.10.0-1.el8                                          appstream                         43 k
 libglvnd                                 x86_64                1:1.3.2-1.el8                                         appstream                        126 k
 libglvnd-egl                             x86_64                1:1.3.2-1.el8                                         appstream                         48 k
 libglvnd-gles                            x86_64                1:1.3.2-1.el8                                         appstream                         39 k
 libglvnd-glx                             x86_64                1:1.3.2-1.el8                                         appstream                        136 k
 libglvnd-opengl                          x86_64                1:1.3.2-1.el8                                         appstream                         46 k
 libinput                                 x86_64                1.16.3-2.el8                                          appstream                        216 k
 libpciaccess                             x86_64                0.14-1.el8                                            baseos                            31 k
 libvdpau                                 x86_64                1.4-2.el8                                             appstream                         40 k
 libwacom                                 x86_64                1.6-3.el8                                             appstream                         41 k
 libwacom-data                            noarch                1.6-3.el8                                             appstream                        103 k
 libwayland-client                        x86_64                1.19.0-1.el8                                          appstream                         38 k
 libwayland-cursor                        x86_64                1.19.0-1.el8                                          appstream                         25 k
 libwayland-egl                           x86_64                1.19.0-1.el8                                          appstream                         18 k
 libwayland-server                        x86_64                1.19.0-1.el8                                          appstream                         46 k
 libxcb-devel                             x86_64                1.13.1-1.el8                                          appstream                        1.1 M
 libxkbfile                               x86_64                1.1.0-1.el8                                           appstream                         87 k
 libxshmfence                             x86_64                1.3-2.el8                                             appstream                         12 k
 llvm-libs                                x86_64                12.0.1-2.module+el8.5.0+692+8756646f                  appstream                         23 M
 mesa-libEGL                              x86_64                21.1.5-1.el8.0                                        appstream                        134 k
 mesa-libGL                               x86_64                21.1.5-1.el8.0                                        appstream                        183 k
 mesa-libgbm                              x86_64                21.1.5-1.el8.0                                        appstream                         56 k
 mesa-libglapi                            x86_64                21.1.5-1.el8.0                                        appstream                         64 k
 mesa-vulkan-drivers                      x86_64                21.1.5-1.el8.0                                        appstream                        6.1 M
 mtdev                                    x86_64                1.1.5-12.el8                                          appstream                         23 k
 ocl-icd                                  x86_64                2.2.12-1.el8                                          appstream                         50 k
 opencl-filesystem                        noarch                1.0-6.el8                                             appstream                        7.3 k
 rest                                     x86_64                0.8.1-2.el8                                           appstream                         69 k
 vulkan-loader                            x86_64                1.2.198.0-2.el8_5                                     appstream                        122 k
 xorg-x11-drv-fbdev                       x86_64                0.5.0-2.el8                                           appstream                         26 k
 xorg-x11-drv-libinput                    x86_64                0.29.0-1.el8                                          appstream                         49 k
 xorg-x11-drv-vesa                        x86_64                2.4.0-3.el8                                           appstream                         30 k
 xorg-x11-proto-devel                     noarch                2020.1-3.el8                                          appstream                        279 k
 xorg-x11-server-Xorg                     x86_64                1.20.11-2.el8                                         appstream                        1.5 M
 xorg-x11-server-common                   x86_64                1.20.11-2.el8                                         appstream                         41 k
 xorg-x11-xkb-utils                       x86_64                7.7-28.el8                                            appstream                        113 k
Installing weak dependencies:
 dconf                                    x86_64                0.28.0-4.el8                                          appstream                        107 k
Installing module profiles:
 nvidia-driver/default
Enabling module streams:
 llvm-toolset                                                   rhel8
 nvidia-driver                                                  latest-dkms

Transaction Summary
=============================================================================================================================================================
Install  70 Packages

Total download size: 336 M
Installed size: 836 M
Is this ok [y/N]:

I’m searching for a procedure for installing CUDA without graphical stuffs.
Thank you.

Ivan

If you look at the in the installation guide jlehtone linked there are instructions in the advanced-setup section of the guide.

sudo dnf install cuda-toolkit-11-6  nvidia-driver-cuda akmod-nvidia

Hi @tkuraku!
I’ve already tried:

# dnf install cuda-toolkit-11-6 nvidia-driver-cuda akmod-nvidia
Last metadata expiration check: 0:36:40 ago on Sat 19 Feb 2022 02:39:27 PM EST.
No match for argument: akmod-nvidia
Error: Unable to find a match: akmod-nvidia

Next I found out the package kmod-nvidia-latest-dkms:

# dnf info kmod-nvidia-latest-dkms
Last metadata expiration check: 0:02:14 ago on Sat 19 Feb 2022 03:24:05 PM EST.
Available Packages
Name         : kmod-nvidia-latest-dkms
Epoch        : 3
Version      : 510.47.03
Release      : 1.el8
Architecture : x86_64
Size         : 28 M
Source       : kmod-nvidia-latest-dkms-510.47.03-1.el8.src.rpm
Repository   : cuda-rhel8-x86_64
Summary      : NVIDIA display driver kernel module
URL          : http://www.nvidia.com/object/unix.html
License      : NVIDIA License
Description  : This package provides the proprietary Nvidia kernel driver
             : modules. The modules are rebuilt through the DKMS system when a
             : new kernel or modules become available.

I think this could replace akmod-nvidia.
The problem is that when try to install packages, it seems Xorg will be installed, but I don’t need it on an headless server.

# dnf install cuda-toolkit-11-6 nvidia-driver-cuda kmod-nvidia-latest-dkms
Last metadata expiration check: 0:08:42 ago on Sat 19 Feb 2022 03:24:05 PM EST.
Dependencies resolved.
=============================================================================================================================================================
 Package                                           Architecture       Version                                            Repository                     Size
=============================================================================================================================================================
Installing:
 cuda-toolkit-11-6                                 x86_64             11.6.0-1                                           cuda-rhel8-x86_64             4.2 k
 kmod-nvidia-latest-dkms                           x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64              28 M
 nvidia-driver-cuda                                x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64             591 k
Installing dependencies:
 alsa-lib                                          x86_64             1.2.5-4.el8                                        appstream                     488 k
 cuda-cccl-11-6                                    x86_64             11.6.55-1                                          cuda-rhel8-x86_64             1.7 M
 cuda-command-line-tools-11-6                      x86_64             11.6.0-1                                           cuda-rhel8-x86_64             2.9 k
 cuda-compiler-11-6                                x86_64             11.6.0-1                                           cuda-rhel8-x86_64             2.7 k
 cuda-cudart-11-6                                  x86_64             11.6.55-1                                          cuda-rhel8-x86_64             215 k
 cuda-cudart-devel-11-6                            x86_64             11.6.55-1                                          cuda-rhel8-x86_64             1.3 M
 cuda-cuobjdump-11-6                               x86_64             11.6.55-1                                          cuda-rhel8-x86_64             134 k
 cuda-cupti-11-6                                   x86_64             11.6.55-1                                          cuda-rhel8-x86_64              23 M
 cuda-cuxxfilt-11-6                                x86_64             11.6.55-1                                          cuda-rhel8-x86_64             276 k
 cuda-documentation-11-6                           x86_64             11.6.55-1                                          cuda-rhel8-x86_64             121 k
 cuda-driver-devel-11-6                            x86_64             11.6.55-1                                          cuda-rhel8-x86_64              35 k
 cuda-gdb-11-6                                     x86_64             11.6.55-1                                          cuda-rhel8-x86_64             5.2 M
 cuda-libraries-11-6                               x86_64             11.6.0-1                                           cuda-rhel8-x86_64             2.9 k
 cuda-libraries-devel-11-6                         x86_64             11.6.0-1                                           cuda-rhel8-x86_64             3.1 k
 cuda-license-10-1                                 x86_64             10.1.243-1                                         cuda-rhel8-x86_64              22 k
 cuda-memcheck-11-6                                x86_64             11.6.55-1                                          cuda-rhel8-x86_64             172 k
 cuda-nsight-11-6                                  x86_64             11.6.55-1                                          cuda-rhel8-x86_64             113 M
 cuda-nsight-compute-11-6                          x86_64             11.6.0-1                                           cuda-rhel8-x86_64             5.2 k
 cuda-nsight-systems-11-6                          x86_64             11.6.0-1                                           cuda-rhel8-x86_64             4.9 k
 cuda-nvcc-11-6                                    x86_64             11.6.55-1                                          cuda-rhel8-x86_64              47 M
 cuda-nvdisasm-11-6                                x86_64             11.6.55-1                                          cuda-rhel8-x86_64              31 M
 cuda-nvml-dev-10-1                                x86_64             10.1.243-1                                         cuda-rhel8-x86_64              70 k
 cuda-nvml-devel-11-6                              x86_64             11.6.55-1                                          cuda-rhel8-x86_64              99 k
 cuda-nvprof-11-6                                  x86_64             11.6.55-1                                          cuda-rhel8-x86_64             4.5 M
 cuda-nvprune-11-6                                 x86_64             11.6.55-1                                          cuda-rhel8-x86_64              67 k
 cuda-nvrtc-11-6                                   x86_64             11.6.55-1                                          cuda-rhel8-x86_64              20 M
 cuda-nvrtc-devel-11-6                             x86_64             11.6.55-1                                          cuda-rhel8-x86_64             100 M
 cuda-nvtx-11-6                                    x86_64             11.6.55-1                                          cuda-rhel8-x86_64              83 k
 cuda-nvvp-11-6                                    x86_64             11.6.58-1                                          cuda-rhel8-x86_64             115 M
 cuda-samples-11-6                                 x86_64             11.6.101-1                                         cuda-rhel8-x86_64              33 k
 cuda-sanitizer-11-6                               x86_64             11.6.55-1                                          cuda-rhel8-x86_64              17 M
 cuda-toolkit-11-6-config-common                   noarch             11.6.55-1                                          cuda-rhel8-x86_64             3.1 k
 cuda-toolkit-11-config-common                     noarch             11.6.55-1                                          cuda-rhel8-x86_64             3.4 k
 cuda-toolkit-config-common                        noarch             11.6.55-1                                          cuda-rhel8-x86_64             3.4 k
 cuda-tools-11-6                                   x86_64             11.6.0-1                                           cuda-rhel8-x86_64             2.7 k
 cuda-visual-tools-11-6                            x86_64             11.6.0-1                                           cuda-rhel8-x86_64             2.9 k
 dkms                                              noarch             3.0.3-1.el8                                        epel                           62 k
 dnf-plugin-nvidia                                 noarch             2.0-1.el8                                          cuda-rhel8-x86_64              12 k
 egl-wayland                                       x86_64             1.1.7-1.el8                                        appstream                      33 k
 gds-tools-11-6                                    x86_64             1.2.0.100-1                                        cuda-rhel8-x86_64              40 M
 giflib                                            x86_64             5.1.4-3.el8                                        appstream                      50 k
 java-1.8.0-openjdk-headless-slowdebug             x86_64             1:1.8.0.322.b06-2.el8_5                            powertools                     36 M
 java-1.8.0-openjdk-slowdebug                      x86_64             1:1.8.0.322.b06-2.el8_5                            powertools                    345 k
 libX11-xcb                                        x86_64             1.6.8-5.el8                                        appstream                      13 k
 libXdmcp                                          x86_64             1.1.3-1.el8                                        appstream                      40 k
 libXfont2                                         x86_64             2.0.3-2.el8                                        appstream                     147 k
 libXtst                                           x86_64             1.2.3-7.el8                                        appstream                      21 k
 libcublas-11-6                                    x86_64             11.8.1.74-1                                        cuda-rhel8-x86_64             314 M
 libcublas-devel-11-6                              x86_64             11.8.1.74-1                                        cuda-rhel8-x86_64             326 M
 libcufft-11-6                                     x86_64             10.7.0.55-1                                        cuda-rhel8-x86_64              98 M
 libcufft-devel-11-6                               x86_64             10.7.0.55-1                                        cuda-rhel8-x86_64             206 M
 libcufile-11-6                                    x86_64             1.2.0.100-1                                        cuda-rhel8-x86_64             627 k
 libcufile-devel-11-6                              x86_64             1.2.0.100-1                                        cuda-rhel8-x86_64              13 M
 libcurand-11-6                                    x86_64             10.2.9.55-1                                        cuda-rhel8-x86_64              51 M
 libcurand-devel-11-6                              x86_64             10.2.9.55-1                                        cuda-rhel8-x86_64              52 M
 libcusolver-11-6                                  x86_64             11.3.2.55-1                                        cuda-rhel8-x86_64             110 M
 libcusolver-devel-11-6                            x86_64             11.3.2.55-1                                        cuda-rhel8-x86_64              70 M
 libcusparse-11-6                                  x86_64             11.7.1.55-1                                        cuda-rhel8-x86_64             158 M
 libcusparse-devel-11-6                            x86_64             11.7.1.55-1                                        cuda-rhel8-x86_64             322 M
 libdrm                                            x86_64             2.4.106-2.el8                                      appstream                     166 k
 libepoxy                                          x86_64             1.5.8-1.el8                                        appstream                     224 k
 libevdev                                          x86_64             1.10.0-1.el8                                       appstream                      43 k
 libglvnd                                          x86_64             1:1.3.2-1.el8                                      appstream                     126 k
 libglvnd-egl                                      x86_64             1:1.3.2-1.el8                                      appstream                      48 k
 libglvnd-gles                                     x86_64             1:1.3.2-1.el8                                      appstream                      39 k
 libglvnd-glx                                      x86_64             1:1.3.2-1.el8                                      appstream                     136 k
 libglvnd-opengl                                   x86_64             1:1.3.2-1.el8                                      appstream                      46 k
 libinput                                          x86_64             1.16.3-2.el8                                       appstream                     216 k
 libnpp-11-6                                       x86_64             11.6.0.55-1                                        cuda-rhel8-x86_64             122 M
 libnpp-devel-11-6                                 x86_64             11.6.0.55-1                                        cuda-rhel8-x86_64             119 M
 libnvjpeg-11-6                                    x86_64             11.6.0.55-1                                        cuda-rhel8-x86_64             2.6 M
 libnvjpeg-devel-11-6                              x86_64             11.6.0.55-1                                        cuda-rhel8-x86_64             2.3 M
 libpciaccess                                      x86_64             0.14-1.el8                                         baseos                         31 k
 libvdpau                                          x86_64             1.4-2.el8                                          appstream                      40 k
 libwacom                                          x86_64             1.6-3.el8                                          appstream                      41 k
 libwacom-data                                     noarch             1.6-3.el8                                          appstream                     103 k
 libwayland-client                                 x86_64             1.19.0-1.el8                                       appstream                      38 k
 libwayland-server                                 x86_64             1.19.0-1.el8                                       appstream                      46 k
 libxkbfile                                        x86_64             1.1.0-1.el8                                        appstream                      87 k
 libxshmfence                                      x86_64             1.3-2.el8                                          appstream                      12 k
 llvm-libs                                         x86_64             12.0.1-2.module+el8.5.0+692+8756646f               appstream                      23 M
 mesa-libEGL                                       x86_64             21.1.5-1.el8.0                                     appstream                     134 k
 mesa-libGL                                        x86_64             21.1.5-1.el8.0                                     appstream                     183 k
 mesa-libgbm                                       x86_64             21.1.5-1.el8.0                                     appstream                      56 k
 mesa-libglapi                                     x86_64             21.1.5-1.el8.0                                     appstream                      64 k
 mesa-vulkan-drivers                               x86_64             21.1.5-1.el8.0                                     appstream                     6.1 M
 mtdev                                             x86_64             1.1.5-12.el8                                       appstream                      23 k
 nsight-compute-2022.1.0                           x86_64             2022.1.0.12-1                                      cuda-rhel8-x86_64             458 M
 nsight-systems-2021.5.2                           x86_64             2021.5.2.53_28d0e6e-0                              cuda-rhel8-x86_64             226 M
 nvidia-driver                                     x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64              22 M
 nvidia-driver-cuda-libs                           x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64              63 M
 nvidia-driver-libs                                x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64             168 M
 nvidia-kmod-common                                noarch             3:510.47.03-1.el8                                  cuda-rhel8-x86_64              12 k
 nvidia-persistenced                               x86_64             3:510.47.03-1.el8                                  cuda-rhel8-x86_64              42 k
 ocl-icd                                           x86_64             2.2.12-1.el8                                       appstream                      50 k
 opencl-filesystem                                 noarch             1.0-6.el8                                          appstream                     7.3 k
 ttmkfdir                                          x86_64             3.0.9-54.el8                                       appstream                      61 k
 vulkan-loader                                     x86_64             1.2.198.0-2.el8_5                                  appstream                     122 k
 xorg-x11-drv-fbdev                                x86_64             0.5.0-2.el8                                        appstream                      26 k
 xorg-x11-drv-libinput                             x86_64             0.29.0-1.el8                                       appstream                      49 k
 xorg-x11-drv-vesa                                 x86_64             2.4.0-3.el8                                        appstream                      30 k
 xorg-x11-fonts-Type1                              noarch             7.5-19.el8                                         appstream                     521 k
 xorg-x11-server-Xorg                              x86_64             1.20.11-2.el8                                      appstream                     1.5 M
 xorg-x11-server-common                            x86_64             1.20.11-2.el8                                      appstream                      41 k
 xorg-x11-xkb-utils                                x86_64             7.7-28.el8                                         appstream                     113 k
Enabling module streams:
 llvm-toolset                                                         rhel8
 nvidia-driver                                                        latest-dkms

Transaction Summary
=============================================================================================================================================================
Install  108 Packages

Total download size: 3.4 G
Installed size: 6.7 G
Is this ok [y/N]:

At this point I don’t know if it is possible to install CUDA and NVIDIA driver without install Xorg.
Thanks for help.

Ivan

Some of its packages, but not all of Xorg. Yes, they will consume some disk space – not much (it is the CUDA and NVidia driver that devour disk) – and your setup remains headless.

Ok, message got.
I installed using:

# dnf install cuda-toolkit-11-6 nvidia-driver-cuda kmod-nvidia-latest-dkms

blacklisted nouveau, regenerated initramfs, rebooted.
Now, when I run nvidia-smi I obtain the following message:

# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

What’s going on?

Ivan