Rocky Linux 9.2 NVIDIA-SMI has failed

Hello there.
I try to install vGPU on Rocky Linux 9.2 but i have error
I setup this driver “nvidia-linux-grid-510-510.47.03-1.x86_64”

“# nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

lspci |grep -E “VGA|3D”

06:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Tesla P6] (rev a1)

dkms status

nvidia/510.47.03: added

dkms install nvidia/510.47.03

Sign command: /lib/modules/5.14.0-284.30.1.el9_2.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area…
‘make’ -j4 NV_EXCLUDE_BUILD_MODULES=‘’ KERNEL_UNAME=5.14.0-284.30.1.el9_2.x86_64 IGNORE_CC_MISMATCH=‘1’ modules…(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.14.0-284.30.1.el9_2.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/510.47.03/build/make.log for more information.

From hypervisor

nvidia-smi

Tue Nov 14 09:28:21 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: N/A |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P6 On | 00000000:18:00.0 Off | Off |
| N/A 29C P8 9W / 90W | 16234MiB / 16384MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3006655 C+G vgpu 16192MiB |

Any advice will help me.
Thx

I just installed NVidia driver without any problem.
Are you following NVidia’s installation document?
Follow installation instructions for RHEL 9.

The build fails. Does /var/lib/dkms/nvidia/510.47.03/build/make.log explain why?


Nvidia’s repo (cuda-rhel9-x86_64) seems to have support for driver versions 515–545; not for 510.

Explained, but I don’t know what they mean.

“/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h: In function ‘nv_dma_resv_add_excl_fence’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h:72:5: error: implicit declaration of function ‘dma_resv_add_excl_fence’; did you mean ‘nv_dma_resv_add_excl_fence’? [-Werror=implicit-function-declaration]
72 | dma_resv_add_excl_fence(obj, fence);
| ^~~~~~~~~~~~~~~~~~~~~~~
| nv_dma_resv_add_excl_fence
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.c: In function ‘nv_drm_init_mode_config’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.c:257:21: error: ‘struct drm_mode_config’ has no member named ‘allow_fb_modifiers’
257 | dev->mode_config.allow_fb_modifiers = true;
| ^
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:321: /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.o] Error 1
make[2]: *** Waiting for unfinished jobs…
In file included from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-gem.h:45,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-gem-nvkms-memory.h:30,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-fb.h:38,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:33:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h: In function ‘nv_dma_resv_add_excl_fence’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h:72:5: error: implicit declaration of function ‘dma_resv_add_excl_fence’; did you mean ‘nv_dma_resv_add_excl_fence’? [-Werror=implicit-function-declaration]
72 | dma_resv_add_excl_fence(obj, fence);
| ^~~~~~~~~~~~~~~~~~~~~~~
| nv_dma_resv_add_excl_fence
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘cursor_plane_req_config_update’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:81:32: warning: unused variable ‘nv_drm_plane_state’ [-Wunused-variable]
81 | struct nv_drm_plane_state *nv_drm_plane_state =
| ^~~~~~~~~~~~~~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:80:27: warning: unused variable ‘nv_dev’ [-Wunused-variable]
80 | struct nv_drm_device *nv_dev = to_nv_device(plane->dev);
| ^~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘plane_req_config_update’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:182:9: warning: unused variable ‘ret’ [-Wunused-variable]
182 | int ret = 0;
| ^~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘nv_drm_plane_atomic_set_property’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:497:32: warning: unused variable ‘nv_drm_plane_state’ [-Wunused-variable]
497 | struct nv_drm_plane_state *nv_drm_plane_state =
| ^~~~~~~~~~~~~~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘nv_drm_enumerate_crtcs_and_planes’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:1141:13: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
1141 | struct drm_plane *overlay_plane =
| ^~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:321: /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.o] Error 1
make[1]: *** [Makefile:1923: /var/lib/dkms/nvidia/510.47.03/build] Error 2
make[1]: Leaving directory ‘/usr/src/kernels/5.14.0-284.30.1.el9_2.x86_64’
make: *** [Makefile:82: modules] Error 2”

I’d guess that the 510 is too old to be built for the current kernels.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.