Rocky Linux 9.2 NVIDIA-SMI has failed

Hello there.
I try to install vGPU on Rocky Linux 9.2 but i have error
I setup this driver “nvidia-linux-grid-510-510.47.03-1.x86_64”

“# nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

lspci |grep -E “VGA|3D”

06:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Tesla P6] (rev a1)

dkms status

nvidia/510.47.03: added

dkms install nvidia/510.47.03

Sign command: /lib/modules/5.14.0-284.30.1.el9_2.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area…
‘make’ -j4 NV_EXCLUDE_BUILD_MODULES=‘’ KERNEL_UNAME=5.14.0-284.30.1.el9_2.x86_64 IGNORE_CC_MISMATCH=‘1’ modules…(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.14.0-284.30.1.el9_2.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/510.47.03/build/make.log for more information.

From hypervisor

nvidia-smi

Tue Nov 14 09:28:21 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: N/A |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P6 On | 00000000:18:00.0 Off | Off |
| N/A 29C P8 9W / 90W | 16234MiB / 16384MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3006655 C+G vgpu 16192MiB |

Any advice will help me.
Thx

I just installed NVidia driver without any problem.
Are you following NVidia’s installation document?
Follow installation instructions for RHEL 9.

The build fails. Does /var/lib/dkms/nvidia/510.47.03/build/make.log explain why?


Nvidia’s repo (cuda-rhel9-x86_64) seems to have support for driver versions 515–545; not for 510.

Explained, but I don’t know what they mean.

“/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h: In function ‘nv_dma_resv_add_excl_fence’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h:72:5: error: implicit declaration of function ‘dma_resv_add_excl_fence’; did you mean ‘nv_dma_resv_add_excl_fence’? [-Werror=implicit-function-declaration]
72 | dma_resv_add_excl_fence(obj, fence);
| ^~~~~~~~~~~~~~~~~~~~~~~
| nv_dma_resv_add_excl_fence
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.c: In function ‘nv_drm_init_mode_config’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.c:257:21: error: ‘struct drm_mode_config’ has no member named ‘allow_fb_modifiers’
257 | dev->mode_config.allow_fb_modifiers = true;
| ^
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:321: /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-drv.o] Error 1
make[2]: *** Waiting for unfinished jobs…
In file included from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-gem.h:45,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-gem-nvkms-memory.h:30,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-fb.h:38,
from /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:33:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h: In function ‘nv_dma_resv_add_excl_fence’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-dma-resv-helper.h:72:5: error: implicit declaration of function ‘dma_resv_add_excl_fence’; did you mean ‘nv_dma_resv_add_excl_fence’? [-Werror=implicit-function-declaration]
72 | dma_resv_add_excl_fence(obj, fence);
| ^~~~~~~~~~~~~~~~~~~~~~~
| nv_dma_resv_add_excl_fence
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘cursor_plane_req_config_update’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:81:32: warning: unused variable ‘nv_drm_plane_state’ [-Wunused-variable]
81 | struct nv_drm_plane_state *nv_drm_plane_state =
| ^~~~~~~~~~~~~~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:80:27: warning: unused variable ‘nv_dev’ [-Wunused-variable]
80 | struct nv_drm_device *nv_dev = to_nv_device(plane->dev);
| ^~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘plane_req_config_update’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:182:9: warning: unused variable ‘ret’ [-Wunused-variable]
182 | int ret = 0;
| ^~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘nv_drm_plane_atomic_set_property’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:497:32: warning: unused variable ‘nv_drm_plane_state’ [-Wunused-variable]
497 | struct nv_drm_plane_state *nv_drm_plane_state =
| ^~~~~~~~~~~~~~~~~~
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c: In function ‘nv_drm_enumerate_crtcs_and_planes’:
/var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.c:1141:13: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
1141 | struct drm_plane *overlay_plane =
| ^~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:321: /var/lib/dkms/nvidia/510.47.03/build/nvidia-drm/nvidia-drm-crtc.o] Error 1
make[1]: *** [Makefile:1923: /var/lib/dkms/nvidia/510.47.03/build] Error 2
make[1]: Leaving directory ‘/usr/src/kernels/5.14.0-284.30.1.el9_2.x86_64’
make: *** [Makefile:82: modules] Error 2”

I’d guess that the 510 is too old to be built for the current kernels.