Can't login after nvidia driver install

I have a Quadro P2000 GPU that I’m using on Rocky 9.3 with KDE. After install the console boots and I login but then I get a black screen and it goes back to login. Googling seems to indicate this is common for other distros but I found no good fix for Rocky. I used the nvidia RHEL9 repository, maybe this won’t work? I’m still able to ssh into the system. I’ve been doing headless sysadmin on linux since RH3 but never once setup a workstation with a GPU and drivers so I’m out of my area.

Here is how I installed the nvidia drivers from nvidia’s RHEL9 repository:

dnf install epel-release
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
dnf install nvidia-driver nvidia-settings
dnf install cuda-driver

These all installed fine. Driver is nvidia-driver-3:545.23.08-1.el9.x86_64

uname -a:

Linux Zoomer 5.14.0-362.13.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 13 14:07:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

dmesg after logging in:

[   16.740802] nvidia-nvlink: Nvlink Core is being initialized, major device number 237

[   16.742966] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   16.756067] intel_rapl_common: Found RAPL domain package
[   16.756075] intel_rapl_common: Found RAPL domain dram
[   16.756079] intel_rapl_common: DRAM domain energy unit 15300pj
[   16.757070] intel_rapl_common: Found RAPL domain package
[   16.757080] intel_rapl_common: Found RAPL domain dram
[   16.757086] intel_rapl_common: DRAM domain energy unit 15300pj
[   16.945814] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  545.23.08  Mon Nov  6 23:49:37 UTC 2023
[   17.013038] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[   17.413341] nvidia-uvm: Loaded the UVM driver, major device number 235.
[   17.443597] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  545.23.08  Mon Nov  6 23:23:07 UTC 2023
[   17.458450] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[   17.458453] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[   17.479476] XFS (dm-2): Mounting V5 Filesystem
[   17.494614] XFS (sdb2): Mounting V5 Filesystem
[   17.661618] XFS (dm-2): Ending clean mount
[   17.676396] XFS (sdb2): Ending clean mount
[   18.150966] RPC: Registered named UNIX socket transport module.
[   18.150969] RPC: Registered udp transport module.
[   18.150970] RPC: Registered tcp transport module.
[   18.150970] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   18.762823] msr: Write to unrecognized MSR 0x17f by mcelog (pid: 1956).
[   18.762827] msr: See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about for details.
[   19.845186] NET: Registered PF_QIPCRTR protocol family
[   21.541893] ixgbe 0000:86:00.0: registered PHC device on ens5f0
[   21.858924] ixgbe 0000:86:00.1: registered PHC device on ens5f1
[   24.928876] e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx
[   24.928937] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   28.826016] block dm-0: the capability attribute has been deprecated.
[   28.950196] mpt3sas 0000:01:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[ 1074.172111] logitech-hidpp-device 0003:046D:4051.0007: HID++ 4.5 device connected.

journalctl -xe after logging in:

Dec 28 10:32:26 Zoomer sddm-helper[5496]: Starting X11 session: "" "/usr/bin/sddm-greeter --socket /tmp/sddm-:0-qHVPRL --theme /usr/share/sddm/themes/breeze"
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: Loading file:///usr/share/sddm/themes/breeze/Main.qml...
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject: Cannot create children for a parent that is in a different thread.
                                           (Parent is QGuiApplication(0x7fff933d82e0), parent's thread is QThread(0x56175e3653e0), current thread is QThread(0x56175e45d2d0)
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: QObject::installEventFilter(): Cannot filter events for objects in a different thread.
Dec 28 10:32:26 Zoomer sddm-greeter[5497]: Failed to find a Kirigami platform plugin
Dec 28 10:32:27 Zoomer sddm-greeter[5497]: Qt Quick Layouts: Polish loop detected. Aborting after two iterations.
Dec 28 10:32:27 Zoomer sddm-greeter[5497]: Hunspell dictionary is missing for "en_US" . Search paths ("/usr/share/qt5/qtvirtualkeyboard/hunspell", "/usr/share/hunspell", "/usr/share/myspell/dicts")

xorg.conf.d/10-nvidia:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    Option "PrimaryGPU" "yes"
    Option "SLI" "Auto"
    Option "BaseMosaic" "on"
EndSection

Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
EndSection

output of nvidia-smi:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P2000                   Off | 00000000:03:00.0  On |                  N/A |
| 47%   34C    P8               5W /  75W |    152MiB /  5120MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      7724      G   /usr/libexec/Xorg                            40MiB |
|    0   N/A  N/A      7909      G   /usr/bin/sddm-greeter                       109MiB |
+---------------------------------------------------------------------------------------+

 lshw -c video
  *-display
       description: VGA compatible controller
       product: GP106GL [Quadro P2000]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:239 memory:f2000000-f2ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:3000(size=128) memory:c0000-dffff
  *-graphics
       product: EFI VGA
       physical id: 1
       logical name: /dev/fb0
       capabilities: fb
       configuration: depth=32 resolution=1024,768

Secureboot is turned off in the BIOS.

I can do a Ctrl-Alt-F3 on the console and login with the same user and run startx and everything works fine.

Any idea’s would be appreciated!

Just to rule it out, have you done systemctl set default graphical.target?

Well it’s booting to KDE but I just did:

#systemctl get-default
  graphical.target
#systemctl set-default graphical.target
reboot

And I still have the same problem.

What display manager is configured?
what does below report

systemctl status display-manager

if stopped and disabled then start and enable.

sddm seems to be running:

# systemctl status display-manager
● sddm.service - Simple Desktop Display Manager
     Loaded: loaded (/usr/lib/systemd/system/sddm.service; enabled; preset: enabled)
     Active: active (running) since Fri 2023-12-29 09:24:45 CST; 39min ago
       Docs: man:sddm(1)
             man:sddm.conf(5)
   Main PID: 2606 (sddm)
      Tasks: 4 (limit: 1646266)
     Memory: 141.9M
        CPU: 1.902s
     CGroup: /system.slice/sddm.service
             ├─2606 /usr/bin/sddm
             └─2736 /usr/libexec/Xorg -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_UlpqzZ -noreset -displayfd 15

Dec 29 09:24:45 Zoomer systemd[1]: Started Simple Desktop Display Manager.
Dec 29 09:24:48 Zoomer sddm-helper[3195]: pam_unix(sddm-greeter:session): session opened for user sddm(uid=976) by (uid=0)
Dec 29 09:24:48 Zoomer sddm-helper[3195]: Starting X11 session: "" "/usr/bin/sddm-greeter --socket /tmp/sddm-:0-dPgLGt --theme /usr/share/sddm/themes/breeze"

Can you try to connect your monitor to your onboard port instead of your GPU and reboot to see if there’s actually a problem with your GPU driver module?

I don’t have a onboard port but I’ll try a non-nvidia card and report back.

A non-nvidia GPU logs in fine. I also uninstalled the nvidia drivers and it logged in. I reinstalled the nvidia drivers and it no longer logs in.

From the mattermost chat, as I see you ask there too: Howto/NVIDIA - RPM Fusion

You may find that your card, depending on whatever model it is, may not use the 510+ driver, and you may need to use an older one, eg: 470, 390, 340, etc.

Check your model and follow the instructions linked and it should work fine. I had a laptop with Fedora, that went black screen using newer drivers, I cannot remember now if it was 510 and reverting to 470, or 470 reverting to 390. Either way, this could be a reason for your problem.

Thanks, I saw that on the chat.

I’ve used this card (Quadro P2000 released in 2017) on Rocky 9.0 with Gnome and it worked fine with the latest driver. The rpmfusion document says the latest driver will work with 2014+ cards. This seems to be something going on with KDE/nvidia login since I can Ctrl-Alt-F3 and login on the command line and then do startx and all is fine.

I can also switch to Plasma (X11) (in the lower left corner of the login screen) and that logs in fine.

I’ll give the rpmfusion drivers a try though.

As far as I know wayland doesn’t work with nvdia.

OK, we can call this solved.

The special sauce is adding nvidia_drm.modeset=1 to the grub command line. For anyone reading this in the future here are the steps to get the RHEL9 drivers from the nvidia website to work with wayland/KDE running on Rocky 9.3.

Download the Rocky KDE distro

dnf install epel-release
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
dnf install nvidia-driver nvidia-settings
dnf install cuda-driver

Then add nvidia_drm.modeset=1 to grub permanently in /etc/default/grub or when booting on the command line.

2 Likes

Also a bug in 9.3 Rocky prevents grub from being updated. Do this to make it work:

add nvidia_drm.modeset=1 in /etc/default/grub 
grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg  (rocky 9.3 or later)
1 Like

So where is this option to grub2-mkconfig documented? The “man page” is very sparse. I didn’t find any reference in the info pages either. I assume this option updates the /etc/kernel/cmdline file so that the kernel parameters are added automatically when new kernels are installed.

It was introduced in RHEL 9.3 release notes: Chapter 4. New features Red Hat Enterprise Linux 9 | Red Hat Customer Portal

Not sure whether “BLS snippets” and /etc/kernel/cmdline get touched or what.

The --help does not say much either:

$ grub2-mkconfig --help
Usage: grub2-mkconfig [OPTION]
Generate a grub config file

  -o, --output=FILE       output generated config to FILE [default=stdout]
  --no-grubenv-update     do not update variables in the grubenv file
  --update-bls-cmdline    overwrite BLS cmdline args with default args
  -h, --help              print this message and exit
  -V, --version           print the version information and exit

Report bugs to <bug-grub@gnu.org>.

Ya. it is inconclusive whether it populates the /etc/kernel/cmdline or just the files in “/boot/loader/entries/<machineID>-$(uname -r).conf” . I think just the latter. RH policy per their RL9 documentation is that the sysadmin is expected to run grub2-mkconfig after each new kernel install. It is frustrating that they don’t connect the dots and clarify that if you use the command:
grubby --update=ALL --args “new args” that this will populate the cmdline file with your parameters and then from there on they are applied automatically to each new kernel.

Where do they say that last part?
Ahh, found it: 9.0 Release Notes Red Hat Enterprise Linux 9 | Red Hat Customer Portal

It is clear that if you modify the /etc/default/grub, then you have to run the grub2-mkconfig in order to deploy the changes into use.

The kernel-core has script that runs kernel-install. There is man kernel-install
and /usr/lib/kernel/install.d/90-loaderentry.install might be where the magick happens.
That seems able to read /etc/kernel/cmdline.


The question remains, what creates/updates /etc/kernel/cmdline?

That seems to be the issue – if change of /etc/default/grub is not propagated to /etc/kernel/cmdline and since install uses(?) the latter rather than options from previous/running kernel (as it used to do) our changes are ignored.

That statement isn’t in the release notes it is in the main RL9 documentation describing the management of kernel installs with grub.

What creates /etc/kernel/cmdline was discussed in this thread here:

cmdline creation

Those referenced scripts are provided in
systemd-udev-252-18.el9.x86_64

So the update / creation of cmdline is up to the maintainers of those scripts and are outside the scope of grubby itself.

Did you check how much free space you have on your System or Home partitions? Gui’s often need at least 1 GB free space for them to load. Have you tried the OpenSource Nouveau Drivers rather than the legacy nVidia Drivers? I’ve found that Nouveau, combined with Mesa, usually works better than the Legacy drivers.

Also, make sure you use a normal User Account rather than root, when using a GUI. Root is never a good option for GUI’s.