Ssh-server somewhat broken since Rocky 9

Since Rocky 9, it takes pretty long (on slow machines) to login via ssh (regardless if you use keys or passwords).

I tracked down the problem and I figured out, that the reason for the very slow login is, that PAM tries to close 65535(!!) file descriptors (in close_fds() / pam_modutil_sanitize.c - only the last 8 of them are valid). Before Rocky 9 (Rocky 8 or CentOS 7), this was processed in ssh (closefrom).
Until further notice I changed the close_fds() code to maximally 21 file descriptors.

BTW: In each case, UsePAM is active!
To be honest, I don’t understand, why Rocky 8 and older use the ssh code to close the fds after clone() and Rocky 9 uses for the same situation close_fds from PAM.

Thanks for any hint!

Seems strange. I don’t have such problems with Rocky 9 and SSH. Connects fast every time. Perhaps there is something else different with your configuration to a basic Rocky 9 install?

Take a fresh install out of the box. Server minimal image. That’s all you need to reproduce it.

If you have a fast machine, you most probably won’t recognize it. But if you don’t have a fast machine, you will recognize it. Just do a strace (strace -o strace.log -f -p $pid) of the master sshd process and you will recognize it (even on fast machines). Take a look afterwards in the strace-file and search for EBADF and you will see it.

This should be fixed urgently! This behavior cannot / shouldn’t be intended! Especially because the behavior is not like this in all other distributions I know - not even Rocky 8 or CentOS 7. At least openSUSE Tumbleweed or LEAP don’t show this behavior (which uses PAM, too), too. It’s just broken!

I cannot reproduce your issue. This is using a new VM with 1 CPU and 4GB of RAM. I do not think your issue has anything to do with “speed”, but rather a configuration issue that you may need to resolve within your system. The bad file descriptors (EBADF) in my opinion are a red herring to the actual cause. From experience, this sort of issue crops up as a result of a bad DNS configuration or even how sssd is configured (when using an AD domain for example).

It would be helpful to know how you configure your systems (e.g. are you configuring realm/sssd to authenticate to AD or are you using FreeIPA, or some other means, are you configuring limits, are there other system configuration changes you’re making – if you don’t do any of the former, do you have DNS? if so is that properly set up).

[nazu@router ~]$ time ssh nazu@10.100.0.173 'echo hello'
nazu@10.100.0.173's password:
hello

real    0m2.134s
user    0m0.015s
sys     0m0.006s
[nazu@router ~]$ ssh-copy-id nazu@10.100.0.173
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/nazu/.ssh/id_ed25519.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
nazu@10.100.0.173's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'nazu@10.100.0.173'"
and check to make sure that only the key(s) you wanted were added.

[nazu@router ~]$ time ssh nazu@10.100.0.173 'echo hello'
hello

real    0m0.318s
user    0m0.011s
sys     0m0.012s

[nazu@router ~]$ ssh nazu@10.100.0.173
Last login: Thu Aug  3 02:06:27 2023 from 10.100.0.1
[nazu@localhost ~]$ lscpu | grep '^CPU'
CPU op-mode(s):                  32-bit, 64-bit
CPU(s):                          1
CPU family:                      6
[nazu@localhost ~]$ free -m
               total        used        free      shared  buff/cache   available
Mem:            3646         414        3150           8         316        3231
Swap:           4023           0        4023

Perhaps I have expressed myself in a misleading way. Not the number of CPUs is relevant, but the speed of the CPU. This part of the code is not threaded - therefore it’s irrelevant, if you have 1 or 128 CPUs. The CPU clock is relevant!

I used a fresh, unchanged installation via bootimage. Installation: small server. Changed nothing. Reboot after install. Logged in via ssh. That’s it. DNS is ok and no, there is no AD or any other thing involved. It’s just pure ssh (password or key authentication doesn’t matter). IP via DHCP (IPv4 only - no global IPv6 address - behavior just out of the box).
Do you want to have a etc.tar.gz containig the complete configuration? But this shouldn’t be exciting, because it’s just the output of the installer.

Works like a charme. But anyway, it’s not involved at all during login.

Let’s take a look at the strace and the code responsible for this behavior (PAM - pam_modutil_sanitize.c):

/* Closes all descriptors after stderr. */
static void
close_fds(void)
{
        /*
         * An arbitrary upper limit for the maximum file descriptor number
         * returned by RLIMIT_NOFILE.
         */
        const int MAX_FD_NO = 65535;

        /* The lower limit is the same as for _POSIX_OPEN_MAX. */
        const int MIN_FD_NO = 20;

        int fd;
        struct rlimit rlim;

        if (getrlimit(RLIMIT_NOFILE, &rlim) || rlim.rlim_max > (rlim_t)MAX_FD_NO)
                fd = MAX_FD_NO;
        else if (rlim.rlim_max < (rlim_t)MIN_FD_NO)
                fd = MIN_FD_NO;
        else
                fd = (int)rlim.rlim_max - 1;

        for (; fd > STDERR_FILENO; --fd)
                close(fd);
}

For each login, 65535 fds are closed after the PAM-module is loaded by ssh via clone. 65535 is an arbitrary value. Why not 10000000? Or even more? MIN_FD_NO would be enough!

Excerpt from strace:

1660  write(2, "debug3: PAM: sshpam_passwd_conv "..., 56) = 56 (ssh - auth_pam.c sshpam_passwd_conv)
1660  newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=2124, ...}, 0) = 0
1660  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
1660  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
1660  getpid()                          = 1660
1660  socket(AF_UNIX, SOCK_STREAM, 0)   = 4
1660  fcntl(4, F_GETFL)                 = 0x2 (flags O_RDWR)
1660  fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
1660  fcntl(4, F_GETFD)                 = 0
1660  fcntl(4, F_SETFD, FD_CLOEXEC)     = 0
1660  connect(4, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = -1 ENOENT (No such file or directory)
1660  close(4)                          = 0
1660  openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
1660  newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=1571, ...}, AT_EMPTY_PATH) = 0
1660  lseek(4, 0, SEEK_SET)             = 0
1660  read(4, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1571
1660  close(4)                          = 0
1660  geteuid()                         = 0
1660  pipe([4, 7])                      = 0
1660  rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f3405454df0}, {sa_handler=SIG_DFL, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f3405454df0}, 8) = 0
1660  clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3405610050) = 1664
1660  write(7, "passwd", 8)           = 8
1660  write(7, "\0", 1 <unfinished ...>
1664  set_robust_list(0x7f3405610060, 24 <unfinished ...>
1660  <... write resumed>)              = 1
1664  <... set_robust_list resumed>)    = 0
1660  close(4)                          = 0
1660  close(7)                          = 0
1664  dup2(4, 0 <unfinished ...>
1660  wait4(1664,  <unfinished ...>
1664  <... dup2 resumed>)               = 0
1664  pipe([8, 9])                      = 0
1664  close(9)                          = 0
1664  dup2(8, 1)                        = 1
1664  close(8)                          = 0
1664  dup2(1, 2)                        = 2
1664  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=512*1024}) = 0
1664  close(65535)                      = -1 EBADF (Bad file descriptor)
...
1664  close(10)                         = -1 EBADF (Bad file descriptor)
1664  close(9)                          = -1 EBADF (Bad file descriptor)
1664  close(8)                          = -1 EBADF (Bad file descriptor)
1664  close(7)                          = 0
1664  close(6)                          = 0
1664  close(5)                          = 0
1664  close(4)                          = 0
1664  close(3)                          = 0
1664  geteuid()                         = 0
1664  setuid(0)                         = 0
1664  execve("/usr/sbin/unix_chkpwd", ["/usr/sbin/unix_chkpwd", "andreas", "nonull"], 0x7f34050ff040 /* 0 vars */) = 0
1664  close(3)                          = 0
1664  getuid()                          = 0
1664  exit_group(0)                     = ?
1664  +++ exited with 0 +++
1660  <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1664
1660  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1664, si_uid=0, si_status=0, si_utime=0, si_stime=1} ---
1660  rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f3405454df0}, NULL, 8) = 0
1660  socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 4
1660  fcntl(4, F_SETFD, FD_CLOEXEC)     = 0
1660  socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 7
1660  bind(7, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 0
1660  getsockname(7, {sa_family=AF_NETLINK, nl_pid=1660, nl_groups=00000000}, [12]) = 0
1660  sendto(7, [{nlmsg_len=20, nlmsg_type=RTM_GETADDR, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1690995697, nlmsg_pid=0}, {ifa_family=AF_UNSPEC, ...}], 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20
1660  recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=76, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1690995697, nlmsg_pid=1660}, {ifa_family=AF_INET, ifa_prefixlen=8, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_HOST, ifa_index=if_nametoindex("lo")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("127.0.0.1")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("127.0.0.1")], [{nla_len=7, nla_type=IFA_LABEL}, "lo"], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=50, tstamp=50}]]], [{nlmsg_len=88, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1690995697, nlmsg_pid=1660}, {ifa_family=AF_INET, ifa_prefixlen=24, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("ens3")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("192.168.1.76")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("192.168.1.76")], [{nla_len=8, nla_type=IFA_BROADCAST}, inet_addr("192.168.1.255")], [{nla_len=9, nla_type=IFA_LABEL}, "ens3"], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_NOPREFIXROUTE], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=1727819, ifa_valid=1727819, cstamp=370, tstamp=370}]]]], iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 164
1660  recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=72, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1690995697, nlmsg_pid=1660}, {ifa_family=AF_INET6, ifa_prefixlen=128, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_HOST, ifa_index=if_nametoindex("lo")}, [[{nla_len=20, nla_type=IFA_ADDRESS}, inet_pton(AF_INET6, "::1")], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=50, tstamp=50}], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT]]], [{nlmsg_len=72, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1690995697, nlmsg_pid=1660}, {ifa_family=AF_INET6, ifa_prefixlen=64, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_LINK, ifa_index=if_nametoindex("ens3")}, [[{nla_len=20, nla_type=IFA_ADDRESS}, inet_pton(AF_INET6, "fe80::5054:ff:fe3e:ec59")], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=367, tstamp=367}], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT|IFA_F_NOPREFIXROUTE]]]], iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 144
1660  recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1690995697, nlmsg_pid=1660}, 0], iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
1660  close(7)                          = 0
1660  readlink("/proc/self/exe", "/usr/sbin/sshd", 4096) = 14
1660  sendto(4, [{nlmsg_len=156, nlmsg_type=0x44c /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=3, nlmsg_pid=0}, "\x6f\x70\x3d\x50\x41\x4d\x3a\x61\x75\x74\x68\x65\x6e\x74\x69\x63\x61\x74\x69\x6f\x6e\x20\x67\x72\x61\x6e\x74\x6f\x72\x73\x3d\x70"...], 156, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 156
1660  poll([{fd=4, events=POLLIN}], 1, 500) = 1 ([{fd=4, revents=POLLIN}])
1660  recvfrom(4, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=3, nlmsg_pid=1660}, {error=0, msg={nlmsg_len=156, nlmsg_type=AUDIT_FIRST_USER_MSG, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=3, nlmsg_pid=0}}], 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [12]) = 36
1660  recvfrom(4, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=3, nlmsg_pid=1660}, {error=0, msg={nlmsg_len=156, nlmsg_type=AUDIT_FIRST_USER_MSG, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=3, nlmsg_pid=0}}], 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [12]) = 36
1660  close(4)                          = 0
1660  getpid()                          = 1660
1660  write(2, "debug1: PAM: password authentica"..., 59) = 59

Yes, you are right, the major problem is: why is this code path used at all? That’s what I’m wondering. No other distribution uses this code path. Just tested additionally Ubuntu Server 22.04 e.g. - behaves as it should, too. It’s just Rock 9 and Alma 9 (at least, they are bug compatible :slight_smile: ).

I “fixed” (= workaround) the bug for me and now, it can be used on slow machines, too. The implementation above is highly questionable and for me it is especially questionable, why this code path is used at all. From my point of view, it shouldn’t be used at all. Normally, the closefrom() function from ssh should be used (this at least is the behavior of Rocky 8 and CentOS 7).

Thanks!
laola23

DNS is involved with SSH, unless of course UseDNS no is set in /etc/ssh/sshd_config. It may be worth trying that option just in case.

UseDNS is off by default.

This can’t be related, because the problem happens at the point as the PAM modules get involved (clone …). They are always loaded. After a clone, you often have to handle fds. This is pretty normal. But here, it is broken to me. It doesn’t make any sense to close tens of thousands of unused fds. ssh is a very safety critical application. I don’t like unreliable and unpredictable behavior in safety critical applications. Especially if all others can handle it just as it should be. Why does it only behave in Rocky 9 like this?

Thanks
laola23

As we said through the thread, we can’t reproduce your issue.

You talked about the speed of the CPU, but you’ve not shown the speed of your CPU’s to show that it can be a part of your issue. More hardware details would be important here to try to reliably reproduce your issue. You could also install sos and run sos report and provide the produced tar ball in a location that we can download and analyze.

Barring that, if you believe what you found is an erroneous code path, you may want to reproduce your issue on CentOS Stream 9 and RHEL 9 and report it to red hat’s bugzilla.

1 Like

The actual CentOS 9 Stream boot iso produces unbootable installation … .

Is it possible to start a private conversation here? I could provide you a link to a kvm VM.

I don’t know why hardware details are relevant - the error is completely independent from the hardware. But if it helps: it’s a APU 4D4.

Update:
Found the culprit: selinux obviously forces ssh to use this broken codepath - even if it is set permissive. Disabling it makes ssh working again as it should.

LOL, you said in your very first post and I quote:

so obviously the hardware specs are related here, since many of us have already stated we do not have this problem whatsoever, even with selinux enabled or disabled.

You say disabling selinux fixes the problem and you blame a broken codepath - without providing any concrete evidence as that being the failure.

Most likely the problem here is rather an selinux issue that would have shown up in /var/log/audit/audit.log and explain why it works when disabled. In fact you could have also put selinux in Permissive mode, and it would also have worked, and the selinux violation would still be logged in the audit.log. Addressing the selinux violation would most likely have fixed it, and allowed you to have selinux enabled and ssh would still work fast.

But again, none of us are having this, it is unique to your installation/hardware. That said, glad it’s now working for you with selinux disabled - which means there was nothing wrong with SSH in the first place.

As @nazunalika said, if it was code-related, it would require opening a bug with Red Hat assuming you can reproduce on RHEL9 or CentOS 9 Stream - which chances are it would also fail there too since Rocky is the same as RHEL9.

I’ve had cases where the audit.log did not get denied entries. Then I did find:

man semanage-dontaudit

and after semanage dontaudit off there finally were the blocking reasons in the log.


I have several el9 systems, but none have shown issue with ssh. Then again, they are all Intel or AMD and even the oldest (from ~2012) are not truly “slow”.


I’m a bit confused on who makes the “non-optimal” function calls? Is it sshd or is it PAM? For what I remember seeing on bugzillas, maintainers of each component are eager to hand the issue over to “actual culprit”.

1 Like

No - this problem can be seen in the VM, too. But it’s masked by the hardware (you don’t recognize it, because of the fast VM CPU). As I already wrote here:

Therefore, it is and never was hardware related.

Two things here:

  1. Selinux forces ssh / PAM to use another codepath, which is
  2. a broken one.

Update:
I checked Rocky 8 regarding selinux and this behavior. ssh just works correctly there even if selinux is enabled (that’s what I’ve previously tested. The default in 8 is an enabled selinux just as in 9). Therefore: something’s broken in 9 with ssh and enabled selinux.

And where is the evidence of this broken codepath? You claim it’s broken, but don’t provide anything to back it up. If it is broken, as was already suggested, replicate in RHEL9 and CentOS 9 Stream and open a bug report with Red Hat.

You did not read / understand the responsible code I already cited here?

Replicate in RHEL9 as mentioned by @nazunalika , open bug here: https://bugzilla.redhat.com/ if you believe this to be the problem.

We cannot replicate it, even @jlehtone on older hardware couldn’t either. But it was Intel/AMD CPU, so possibly related to your hardware.

Rocky is bug-for-bug compatible with RHEL, so it needs to be resolved upstream with RHEL.

To be fair, my “not shown” is “did not notice sluggishness” rather than “not in strace” – haven’t done tracing.


The pam_modutil_sanitize.c is mentioned in 1737812 – optimize closing fds when spawning handlers in PAM by looking up opened handles via /proc/self/fd – an RHEL 8 case. Apparently resolved (for RHEL 8) with pam-1.3.1-8.el8 in April 2020.
However, the code snippets in bugzilla do not match the current upstream.
Apparently, upstream did revert: Revert "libpam/pam_modutil_sanitize.c: optimize the way to close fds" · linux-pam/linux-pam@a7b9ffd · GitHub

Who says 65535? I see “only”:

int fd;
struct rlimit rlim;

if (getrlimit(RLIMIT_NOFILE, &rlim) || rlim.rlim_max > (rlim_t)MAX_FD_NO)
    fd = MAX_FD_NO;
else if (rlim.rlim_max < (rlim_t)MIN_FD_NO)
    fd = MIN_FD_NO;
else
    fd = (int)rlim.rlim_max - 1;

for (; fd > STDERR_FILENO; --fd)
    close(fd);

That is, at least MIN_FD_NO and at most (int)rlim.rlim_max - 1 descriptors; not always MAX_FD_NO descriptors.


Does el9 have the upstream version of pam_modutil_sanitize.c, or does it have something different?

IP via DHCP (IPv4 only - no global IPv6 address - behavior just out of the box).

I do not have an R9 install to verify this, but I always had issue with SSH initial login delays where sshd was in a ipv4 only environment. I found that I needed to edit /etc/ssh/sshd_config and take ipv6 out of the equation by setting…

AddressFamily inet
ListenAddress 0.0.0.0  #modify as required

Then restart the sshd service

This has always eliminated my delay issues in sshd on RHEL derived OS. Not saying this will solve yours, but it’s something to try in addition to what these extremely patient community leaders have expressed.

1 Like

just an fyi: I was also seeing very slow ssh as root logins (45-60sec) over local lan.
yum update, which included an update for pam-1.3.1-27 (from 1.3.1-25) seemed to fix.

Things that didnt fix.

  • Turn off gssapi auth in sshd_config.
  • permissive selinux (did not try disable)

Andrew