SSH change between 8.5 and 8.7?

I’m hoping someone out there can help with this. I have a system that was installed as Rocky 8.5 and later upgraded to 8.7. SSSD is configured to use Kerberos for auth and LDAP for users/groups. This is the infrastructure we have and it’s been working. I do the configuration via an Ansible playbook.

Since the upgrade to 8.7, users cannot log in with SSH using a password, which is how most of our users access the system. I use an ssh key for passwordless login and that works fine.

The error in the log is “Failed password for dobrie2”. I can’t find anything in the logs that’s an obvious failure (or even a non-obvious one). sssctl shows “KERBEROS: not connected.” A RHEL 8.5 server that’s working displays a message that lists the Kerberos server that’s configured in sssd.conf. SSSD is enumerating users and groups correctly (the “id dobrie2” command works as expected), and I can use kinit to create a Kerberos ticket successfully.

I’ve been on a deep dive of SSSD and PAM and I can’t find a configuration difference between a working system and a non-working system. It feels like a bug or some under-the-hood change to something. I’ve done a fair bit of digging and I can’t quite find anyone else having this issue, or a bug report that seems on-target. I’ve tried reverting to the 8.5 version of SSSD, but that didn’t seem to work. Where I’m at right now is at a clean, unpatched 8.5 install that’s had my standard ansible plays done to the point authentication works. And it works like a charm.

I have NOT yet

  • tried upgrading the system to 8.8
  • tried upgrading the working 8.5 to a patched 8.5 or 8.7 to see where it breaks
  • dug deeper into bug reports to see if anything turns up.

Anyone have a suggestion as to where I should look? I’m going to patch my test system to 8.5-latest, then upgrade to 8.7 and 8.8 so I can confirm where it breaks. I should probably do 8.6, too, just to be thorough.

Thanks in advance.

SSSD generally gets version rebases each point release, sometimes with kerberos changes (although minor). Have you gone through the sssd troubleshooting guide? The debug logs can be very verbose and hard to parse but it can help you pinpoint where the issue likely is.

https://sssd.io/troubleshooting/basics.html

I have seen that document and have worked through it. I can’t actually see where the login gets referred to kerberos on the non-working system. A working system does have some stuff in /var/log/sssd/krb5_child.log with the debug turned up to 6.

That seems fishy, but I don’t consider it determinative yet.

I just put the patches on 8.5 and I’m rebooting. I forgot to crank up the debug. Hopefully it still works and I can get a sample of what it’s supposed to look like.

Here’s an update on my findings:

Clean system allowed ssh+password logins through upgrade to a fully-patched Rocky 8.6 system. Base 8.7 system and fully-patched 8.7, plus fully-patched 8.8 do not work.

Working:
2023-07-31T16:22:35-0400 SUBDEBUG Installed: sssd-2.5.2-2.el8.x86_64
2023-07-31T17:12:03-0400 SUBDEBUG Upgrade: sssd-2.5.2-2.el8_5.4.x86_64
2023-07-31T17:37:34-0400 SUBDEBUG Upgrade: sssd-2.6.2-3.el8.x86_64
2023-07-31T17:56:33-0400 SUBDEBUG Upgrade: sssd-2.6.2-4.el8_6.1.x86_64

Not working:
2023-07-31T18:19:28-0400 SUBDEBUG Upgrade: sssd-2.7.3-4.el8_7.1.x86_64
2023-07-31T18:57:14-0400 SUBDEBUG Upgrade: sssd-2.7.3-4.el8_7.3.x86_64
2023-07-31T19:11:54-0400 SUBDEBUG Upgrade: sssd-2.8.2-2.el8.x86_64

So something happens between 2.6 and 2.7 that breaks my configuration. Starting with 8.7, I don’t have a file /var/log/sssd/krb5_child.log. I’m sure that’s significant, but I’m not sure how yet. I want to dig into the docs a bit and see if there’s something new in the configuration files I need to tweak before calling it a bug.

Rocky 8.8 and 9.2 are both using the same point releases of sssd, and both appear to have the same problem.

I would definitely get the debug logs between a working system and a non-working system. The configuration will be helpful too to try to drill down to the problem. This can help determine whether this is a bug or a change that requires a configuration modification as a result.

Something to keep in mind as we try to troubleshoot your issue: SSSD, kerberos, pam, nss, samba, and openldap are all tightly coupled. I know that you are mainly using sssd with kerberos and ldap support to do auth. If your configuration is now broken from one point release to another, it may not be a bug, but a change that may require a configuration modification somewhere. Since the aforementioned utilities are interlinked to work with FreeIPA (which utilizes 389ds ldap and kerberos heavily), my team tries to ensure that IPA is generally working before releases. I’m not ruling it out as a bug yet.

Samba and sssd are not tightly coupled, in fact, you shouldn’t use them together.

Sssd requires samba libraries to compile properly. In some configurations they can be used together.

My point of them being tightly coupled was that these libraries and pieces of software are typically compiled together when being developed and released for RHEL and Rocky Linux. That doesn’t mean they all have to be installed together. It just means that we are very careful on what versions are compiled and in what order, and if something has gone wrong we need to figure out if it was the versions and/or order of the software we’re compiling.

I know that sssd requires some of the Samba libs and that, theoretically, you can use sssd with Samba, but it isn’t supported by red-hat.
If you just want authentication from AD, then sssd is great, but as soon as Samba enters the scene, sssd should go away. If you are running a version of Samba >= 4.8.0 with ‘security = ADS’, then you also need to run winbind.
sssd and winbind are very similar (not surprising, a lot of both was written by the same person), so why run both ? There is also the fact that Samba has quite a few different idmap backends that can do the same as idmap_sss (which, by the way, isn’t supplied or maintained by Samba) and are supported by Samba.
It comes down to: using sssd with Samba isn’t supported by any entity and isn’t required, Samba as a Unix domain member will working very well without it.

since this still appears to be an issue, i just wanted to add how i would troubleshoot this.
forget whats been upgraded and just focus on the debug for now.
put the sshd daemon in oneshot mode on the server with /usr/bin/sshd -DDDDD -p 666
you do this so you can leave the existing sshd in place.
if you are using firewalld then add a port for 666 or whatever port you choose to use.
you may have to temporarily disable selinux with setenforce 0 as selinux knows what ports are used for ssh and will block the debug port.
alternativly add the port to selinux
semanage port -a -t ssh_port_t -p tcp 666
now from another server ssh -vvv -p 666 servername
and collect the logs.
next rerun /usr/bin/sshd -DDDDD -p 666 but under strace
ie strace -f /usr/bin/sshd -DDDDD -p 666
and collect the logs.
examine the bits where the authentication takes place and see if you get a clue from the sshd logs why it failed password auth and fell through to the next authentication option.
ive seen this very same behaviour years ago, on rhel 4 or 5 when some libraries needed updating and thats how i found it, not that im suggesting your libraries are out of date, its just the method.
and we also used kerberos.
regards peter

I am pleased to report that I found the problem. (I ended up reproducing this on RHEL and opening a support ticket. THAT was quite the adventure. They were not able to identify the issue before I did.)

TL;DR:
This was caused by a change in the PAM package introduced in 8.7 with pam-1.3.1-22.el8. It’s definitely an edge case affected by changes we made to /etc/login.defs that, until now, worked as expected.

Long version:

After a bit of discussion with a colleague, I remembered that I had tried to roll back the SSSD packages to the ones from 8.6 and this did not correct the issue. I was much more focused at the time on getting it working than finding the problem, so I didn’t connect it until now. I tried again on a test system. The sssd-2.6.2 from 8.6, which worked on a clean install, did not work after downgrading, so we realized the cause of the problem was somewhere else.

We next downgraded PAM from pam-1.3.1-22.el8 (provided with RHEL 8.7) to pam-1.3.1-16.el8_6.1 (a RHEL 8.6 update). Success! Logging in with username and password worked again!

Having identified the versions of pam that work/don’t work, I compared the changes between the pam-1.3.1-16.el8_6.1 and pam-1.3.1-22.el8 SRPMS. This patch from the 1.3.1-22 SRPM seems to be the culprit:

pam-1.3.1-pam-usertype-SYS_UID_MAX.patch

In our environment, we use local accounts and accounts from an LDAP directory. In certain cases in the past, there have been collisions between local UIDs and LDAP POSIX IDs. In order to avoid these collisions, we’ve modified the login.defs file:

#
# Min/max values for automatic uid selection in useradd
#
UID_MIN               1000001
UID_MAX               1009999
# System accounts
SYS_UID_MIN            999001
SYS_UID_MAX            999999

#
# Min/max values for automatic gid selection in groupadd
#
GID_MIN               1000001
GID_MAX               1009999
# System accounts
SYS_GID_MIN            999001
SYS_GID_MAX            999999

Through RHEL 8.6, by default, PAM is configured to prevent system IDs from logging in interactively. The cut-off between “system” accounts and “user” accounts (which was a breaking change between RHEL6 and RHEL7) was hard-coded in /etc/pam.d/system-auth. In the file /etc/system-auth, the line is

account sufficient pam_succeed_if.so uid < 1000 quiet

After the application of the patch, the cut-off values are now determined programmatically by examining the settings in /etc/login.defs. The corresponding line in /etc/pam.d/system-auth is now

auth [default=1 ignore=ignore success=ok] pam_usertype.so isregular

My POSIX UID in LDAP is 532734, which is less than 999999 (or between 999001 and 999999–I’m not sure of the test). PAM decides I can’t log in even before it sends a request to SSSD to authenticate against our KDC.

I’m still deciding the best approach to avoiding the problem created by this change (and fighting with RH about this breaking change in the middle of a major release is a bug). I could change /etc/pam.d/system-auth back to a hard-coded value, but I’d rather not mess with the PAM default configuration. I might not actually need to change the UID range for system accounts, so changing that back to the default might be the path of least resistance, although I’m bound to have collision issues. There’s only been a small number that have caused actual problems, but I’d rather have a systematic fix than keep creating exceptions.

3 Likes