Kmod-nvidia vs. updates

Hi,

I’m running Rocky Linux 8 + KDE on my battered HP Workstation. I have an NVidia video card on this machine:

# lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)

I’m using the proprietary nvidia-x11-drv-470xx driver from ELRepo:

# rpm -qa | grep nvidia
kmod-nvidia-470xx-470.141.03-1.el8_6.elrepo.x86_64
nvidia-x11-drv-470xx-libs-470.141.03-1.el8_6.elrepo.x86_64
nvidia-detect-515.57-1.el8.x86_64
nvidia-x11-drv-470xx-470.141.03-1.el8_6.elrepo.x86_64

Apparently there’s an upgrade available:

# dnf check-update | grep nvidia
kmod-nvidia-470xx.x86_64                470.141.03-2.el8_7.elrepo         elrepo
nvidia-x11-drv-470xx.x86_64             470.141.03-2.el8_7.elrepo         elrepo
nvidia-x11-drv-470xx-libs.x86_64        470.141.03-2.el8_7.elrepo         elrepo

But an attempt to upgrade this spews out a tsunami of errors that look like this:

# dnf update
Error: 
 Problem: cannot install the best update candidate for package kmod-nvidia-470xx-470.141.03-1.el8_6.elrepo.x86_64
  - nothing provides kernel >= 4.18.0-425.3.1.el8 needed by kmod-nvidia-470xx-470.141.03-2.el8_7.elrepo.x86_64
  - nothing provides kernel(efi) = 0x66edb5ab needed by kmod-nvidia-470xx-470.141.03-2.el8_7.elrepo.x86_64
  - nothing provides kernel(__drm_atomic_helper_crtc_destroy_state) = 0x19d9cfcc needed by kmod-nvidia-470xx-470.141.03-2.el8_7.elrepo.x86_64
  - nothing provides kernel(__drm_atomic_helper_crtc_duplicate_state) = 0x0b326feb needed by kmod-nvidia-470xx-470.141.03-2.el8_7.elrepo.x86_64

Any explanation for this specific problem ? A solution ?

Cheers,

Niki

  1. el8_7
  2. Wait for Rocky 8.7

Red Hat did release RHEL 8.7 earlier this week and ELRepo has already released packages for it.
Those “el8_7” kmods work only with the 8.7’s 4.18.0-425.* kernels.
The ELRepo “el8_6” kmods work only with the 8.6’s 4.18.0-372.* kernels.

3 Likes

Back in the old days (between 2001 and 2017 if I remember correctly) I’ve been using Slackware as my “workhorse” distribution. On machines with an NVidia video card, I would simply download and install the proprietary driver from NVidia.com. And whenever there was a kernel update, I would reboot in runlevel 3, rebuild the NVidia driver for this new kernel, and then boot to runlevel 4 (which was and still is Slackware’s graphical runlevel).

As far as I understand (and please correct me if I’m wrong), the purpose of DKMS in some distributions is to automate this kind of task. So my somewhat naive assumption would be that the “best” way to manage proprietary NVidia drivers would be to use the “upstream” proprietary drivers from NVidia.com along with DKMS. This would allow me to avoid situations like these, where there’s a version lag in the official vs. third-party repositories.

What’s your take on this ?

Back then NVidia offered its proprietary driver as “*.run” sh-script. Today NVidia, ELRepo, RPMfusion (and perhaps some others) do offer them RPM-packaged, as Yum-repositories. NVidia probably did create their repo mainly to distribute CUDA packages.

All but ELRepo do actually use the dkms – kernel module is built for kernel during boot, if it is not already there.

ELRepo does exploit the fact that all RHEL kernels within point update remain compatible and therefore there is no need to build the module more than once per point update. The ELRepo version of the package has been very easy to use.

For GPU driver the dkms-approach is fine. In fact, ELRepo maintainers have decided to not build kmod-nvidia for el9, because the other repos are in “good shape”. (They are in less good shape, if you have old card that requires older legacy series driver.)

The “auto-build on first boot” is not an option if the driver is for storage. You can neither load kernel nor build driver, if you don’t already have driver to access the disk that has all your files. There we absolutely have to get that kernel module built before we boot. We could do it with *.run, and we can do it with dkms. Presumably.

I actually have in Ansible play tasks for update:

  - name: Update all, except kernel
    ansible.builtin.dnf:
      name: '*'
      state: latest
      exclude: 'kmod-kvdo,kernel*'
      disablerepo: "{{ use_elrepo | default(false) | ternary( 'elrepo', omit, omit ) }}"

  - name: Break play before kernel
    ansible.builtin.fail:
    when:
    - break_before_kernel is defined

  - name: Update all, including (optional) ELRepo drivers
    ansible.builtin.dnf:
      name: '*'
      state: latest
      enablerepo: "{{ use_elrepo | default(false) | ternary( 'elrepo', omit, omit ) }}"
    register: new_kernel

  - name: Reboot with new kernel
    ansible.builtin.reboot:
    when: new_kernel.changed
    tags:
    - reboot

If I call that play with variable break_before_kernel defined, then the kernel update is skipped altogether.

The reboot occurs only if the second dnf-task does something. I should actually use needs-restarting (from yum-utlis) to determine whether reboot is required.

For what it’s worth, I have the GT710 card which works with the NVidia 470x driver. I got it from rpmfusion, not elrepo–I don’t remember if anything but coincidence was behind my choice. I think it was coincidence as I’m friendly with some of the Elrepo people and usually use their version. If I try to reboot after an upgrade installs a kernel, I get a message that kmods are installing, please wait. After the message that an update is complete, it takes a bit of time for the kmods to install. This is on a home workstation, where a reboot isn’t something I have to avoid, on almalinux-8.7.

So, I would say that especially for home use, the Elrepo or rpmfusion method is fine–both have instructions on using their nvidia drivers on their respective sites. There is no need to use the upsream nvidia as both Elrepo and rpmfusion have already done the work for you.

2 Likes