NVIDIA driver inside docker broke after update rocky linux 8.6

luvres · November 4, 2022, 7:15pm

I build a docker image with the driver and call it in warewulf for the compute nodes. Direct nvidia drivers work when nvidia-smi, but on further testing it doesn’t seem to be right as:

>>> print(torch.cuda.is_available())
False

Two months ago I tried to install the drivers directly from the repository via dnf and it didn’t work. So once I tried again and it worked perfectly and I was using it until yesterday when I did an image update with the repo driver and it didn’t work again. I neglected the versions and ended up losing the image that worked well.
dockerfile I use for official nvidia: https://raw.githubusercontent.com/luvres/hpc/master/dockerfiles/Dockerfile.r8ww-nvidia-slurm
dockerfile for nvidia from repo: https://raw.githubusercontent.com/luvres/hpc/master/dockerfiles/Dockerfile.r8ww-nvrepo-slurm

The image worked until kernel 4.18.0-372.26.1.el8_6.x86_64 and now with 4.18.0-372.32.1.el8_6.x86_64 the nvidia driver doesn’t work. Has anyone ever experienced this? Every help is welcome. I ended up losing the container that worked.

NVIDIA repo dnf

NVIDIA download official site

dali · November 5, 2022, 6:43pm

Looks like you need updated kernel modules for NVIDIA.

I use the drivers from RMPFusion on Rocky 8.6, kernel modules are automatically rebuilt if the kernel is updated.

Add RPMFusion “free” and “nonfree”, Configuration - RPM Fusion.
Install drivers, Howto/NVIDIA - RPM Fusion.

luvres · November 6, 2022, 1:56am

Oh that’s what I need, thanks for the tip! I Ended up brute solving it by emergency and set the host kernel with grubby --set-default "/boot/vmlinuz-4.18.0-372.26.1.el8_6.x86_64" and build the containers with kernel-core-$(uname -r) and kernel-modules-$(uname -r). I’ll look at how to do it with RPM Fusion.

Topic		Replies	Views
Rocky Linux Kernel update broke nvidia driver Rocky Linux Help & Support	4	814	April 30, 2024
Nvidia Graphics card not working after upgrade to 8.7 Rocky Linux Help & Support	7	1259	August 25, 2023
Problem making nvidia driver work on 9.3 on kernel 5.14.0-362.18.1.el9_3.0.1.x86_64 Rocky Linux Help & Support	14	5498	May 17, 2024
Nvidia driver install Rocky 8 - how to? Rocky Linux Help & Support	12	12518	August 25, 2023
Kmod-nvidia vs. updates Rocky Linux Help & Support	5	1855	August 25, 2023

NVIDIA driver inside docker broke after update rocky linux 8.6

Related topics