Upgrade to Rocky9.3 breaks freeipa running instalation, bind core dump

After upgrade to RL9.3 and bind-9.16.23-14, when loading configuration system fails badly dumping core.

This is the startup of bind …

Apr 23 21:03:27 ipa named[7564]: starting BIND 9.16.23-RH (Extended Support Version) id:fde3b1f
Apr 23 21:03:27 ipa named[7564]: running on Linux x86_64 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024

And this is the dump …

Apr 23 21:03:27 ipa named[7564]: configuring command channel from ‘/etc/rndc.key’
Apr 23 21:03:27 ipa named[7564]: command channel listening on 127.0.0.1#953
Apr 23 21:03:27 ipa named[7564]: configuring command channel from ‘/etc/rndc.key’
Apr 23 21:03:27 ipa named[7564]: command channel listening on ::1#953
Apr 23 21:03:27 ipa named[7564]: managed-keys-zone: loaded serial 3731
Apr 23 21:03:27 ipa named[7564]: …/…/…/lib/isc/hp.c:156: INSIST(tid() < isc__hp_max_threads) failed, back trace
Apr 23 21:03:27 ipa named[7564]: #0 0x560f2da54621 in ??
Apr 23 21:03:27 ipa named[7564]: #1 0x7fb9bd78b4e0 in ??
Apr 23 21:03:27 ipa named[7564]: #2 0x7fb9bd78fa4d in ??
Apr 23 21:03:27 ipa named[7564]: #3 0x7fb9bd7bc99c in ??
Apr 23 21:03:27 ipa named[7564]: #4 0x7fb9bd7a7d15 in ??
Apr 23 21:03:27 ipa named[7564]: #5 0x7fb9bd7c50fc in ??
Apr 23 21:03:27 ipa named[7564]: #6 0x7fb9b90eab93 in ??
Apr 23 21:03:27 ipa named[7564]: #7 0x7fb9b90eb354 in ??
Apr 23 21:03:27 ipa named[7564]: #8 0x7fb9b90a2bfa in ??
Apr 23 21:03:27 ipa named[7564]: #9 0x7fb9b90a3943 in ??
Apr 23 21:03:27 ipa named[7564]: #10 0x7fb9b90ec585 in ??
Apr 23 21:03:27 ipa named[7564]: #11 0x7fb9b90eca43 in ??
Apr 23 21:03:27 ipa named[7564]: #12 0x7fb9bd7c468a in ??
Apr 23 21:03:27 ipa named[7564]: #13 0x7fb9bcc9f802 in ??
Apr 23 21:03:27 ipa named[7564]: #14 0x7fb9bcc3f450 in ??
Apr 23 21:03:27 ipa named[7564]: exiting (due to assertion failure)
Apr 23 21:03:27 ipa systemd[1]: Started Process Core Dump (PID 7571/UID 0).
Apr 23 21:03:27 ipa systemd-coredump[7572]: Process 7564 (named) of user 25 dumped core.#012#012Stack trace of thread 7569:#012#0 0x00007fb9bcca154c __pthread_kill_implementation (libc.so.6 + 0xa154c)#012#1 0x00007fb9bcc54d06 raise (libc.so.6 + 0x54d06)#012#2 0x00007fb9bcc287f3 abort (libc.so.6 + 0x287f3)#012#3 0x0000560f2da515b5 assertion_failed.cold (named + 0x1c5b5)#012#4 0x00007fb9bd78b4e0 isc_assertion_failed (libisc-9.16.23-RH.so + 0x1c4e0)#012#5 0x00007fb9bd78fa4d isc_hp_protect (libisc-9.16.23-RH.so + 0x20a4d)#012#6 0x00007fb9bd7bc99c isc_queue_enqueue (libisc-9.16.23-RH.so + 0x4d99c)#012#7 0x00007fb9bd7a7d15 isc__nm_enqueue_ievent (libisc-9.16.23-RH.so + 0x38d15)#012#8 0x00007fb9bd7c50fc isc_task_sendto (libisc-9.16.23-RH.so + 0x560fc)#012#9 0x00007fb9b90eab93 syncrepl_update (ldap.so + 0x1cb93)#012#10 0x00007fb9b90eb354 ldap_sync_search_entry (ldap.so + 0x1d354)#012#11 0x00007fb9b90a2bfa ldap_sync_search_entry (libldap.so.2 + 0x45bfa)#012#12 0x00007fb9b90a3943 ldap_sync_init (libldap.so.2 + 0x46943)#012#13 0x00007fb9b90ec585 ldap_sync_doit (ldap.so + 0x1e585)#012#14 0x00007fb9b90eca43 ldap_syncrepl_watcher.lto_priv.0 (ldap.so + 0x1ea43)#012#15 0x00007fb9bd7c468a isc__trampoline_run (libisc-9.16.23-RH.so + 0x5568a)#012#16 0x00007fb9bcc9f802 start_thread (libc.so.6 + 0x9f802)#012#17 0x00007fb9bcc3f450 __clone3 (libc.so.6 + 0x3f450)#012#012Stack trace of thread 7564:#012#0 0x00007fb9bcc55aca __sigtimedwait (libc.so.6 + 0x55aca)#012#1 0x00007fb9bcc5510c sigwait (libc.so.6 + 0x5510c)#012#2 0x00007fb9bd795003 isc_app_ctxrun (libisc-9.16.23-RH.so + 0x26003)#012#3 0x00007fb9bd7952cc isc_app_run (libisc-9.16.23-RH.so + 0x262cc)#012#4 0x0000560f2da5245a main (named + 0x1d45a)#012#5 0x00007fb9bcc3feb0 __libc_start_call_main (libc.so.6 + 0x3feb0)#012#6 0x00007fb9bcc3ff60 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3ff60)#012#7 0x0000560f2da52f85 _start (named + 0x1df85)#012#012Stack trace of thread 7567:#012#0 0x00007fb9bcc9c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a)#012#1 0x00007fb9bcc9eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4)#012#2 0x00007fb9bd7d6180 isc_condition_waituntil (libisc-9.16.23-RH.so + 0x67180)#012#3 0x00007fb9bd7c83ef run (libisc-9.16.23-RH.so + 0x593ef)#012#4 0x00007fb9bd7c468a isc__trampoline_run (libisc-9.16.23-RH.so + 0x5568a)#012#5 0x00007fb9bcc9f802 start_thread (libc.so.6 + 0x9f802)#012#6 0x00007fb9bcc3f450 __clone3 (libc.so.6 + 0x3f450)#012#012Stack trace of thread 7568:#012#0 0x00007fb9bcd4ea6e epoll_wait (libc.so.6 + 0x14ea6e)#012#1 0x00007fb9bd7cfb7c netthread (libisc-9.16.23-RH.so + 0x60b7c)#012#2 0x00007fb9bd7c468a isc__trampoline_run (libisc-9.16.23-RH.so + 0x5568a)#012#3 0x00007fb9bcc9f802 start_thread (libc.so.6 + 0x9f802)#012#4 0x00007fb9bcc3f450 __clone3 (libc.so.6 + 0x3f450)#012#012Stack trace of thread 7566:#012#0 0x00007fb9bcc3e6bb __sched_yield (libc.so.6 + 0x3e6bb)#012#1 0x00007fb9bd5bdb75 uv__async_io.part.0 (libuv.so.1 + 0xab75)#012#2 0x00007fb9bd5d985e uv__io_poll.part.0 (libuv.so.1 + 0x2685e)#012#3 0x00007fb9bd5c35a8 uv_run (libuv.so.1 + 0x105a8)#012#4 0x00007fb9bd7b1b5b nm_thread (libisc-9.16.23-RH.so + 0x42b5b)#012#5 0x00007fb9bd7c468a isc__trampoline_run (libisc-9.16.23-RH.so + 0x5568a)#012#6 0x00007fb9bcc9f802 start_thread (libc.so.6 + 0x9f802)#012#7 0x00007fb9bcc3f450 __clone3 (libc.so.6 + 0x3f450)#012#012Stack trace of thread 7565:#012#0 0x00007fb9bcd3eb7f __write (libc.so.6 + 0x13eb7f)#012#1 0x00007fb9bd5c2db1 uv_async_send (libuv.so.1 + 0xfdb1)#012#2 0x00007fb9bd7c50fc isc_task_sendto (libisc-9.16.23-RH.so + 0x560fc)#012#3 0x00007fb9bd91c263 dns_resolver_createfetch (libdns-9.16.23-RH.so + 0x11c263)#012#4 0x00007fb9bd96ffbd zone_refreshkeys.lto_priv.0 (libdns-9.16.23-RH.so + 0x16ffbd)#012#5 0x00007fb9bd9789cf zone_timer.lto_priv.0 (libdns-9.16.23-RH.so + 0x1789cf)#012#6 0x00007fb9bd7c68ad isc_task_run (libisc-9.16.23-RH.so + 0x578ad)#012#7 0x00007fb9bd7b1929 process_netievent (libisc-9.16.23-RH.so + 0x42929)#012#8 0x00007fb9bd7b1aa5 process_queue (libisc-9.16.23-RH.so + 0x42aa5)#012#9 0x00007fb9bd7b2297 async_cb (libisc-9.16.23-RH.so + 0x43297)#012#10 0x00007fb9bd5bdb3d uv__async_io.part.0 (libuv.so.1 + 0xab3d)#012#11 0x00007fb9bd5d985e uv__io_poll.part.0 (libuv.so.1 + 0x2685e)#012#12 0x00007fb9bd5c35a8 uv_run (libuv.so.1 + 0x105a8)#012#13 0x00007fb9bd7b1b37 nm_thread (libisc-9.16.23-RH.so + 0x42b37)#012#14 0x00007fb9bd7c468a isc__trampoline_run (libisc-9.16.23-RH.so + 0x5568a)#012#15 0x00007fb9bcc9f802 start_thread (libc.so.6 + 0x9f802)#012#16 0x00007fb9bcc3f450 __clone3 (libc.so.6 + 0x3f450)#012ELF object binary architecture: AMD x86-64
Apr 23 21:03:27 ipa systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE
Apr 23 21:03:27 ipa systemd[1]: named.service: Killing process 7569 (isc-net-0000) with signal SIGKILL.

The solution I found was to downgrade just the involved packages to 9.16.23-11 …

bind-9.16.23-11.el9_2.2.x86_64.rpm
bind-dnssec-doc-9.16.23-11.el9_2.2.noarch.rpm
bind-dnssec-utils-9.16.23-11.el9_2.2.x86_64.rpm
bind-dyndb-ldap-11.9-8.el9_1.x86_64.rpm
bind-libs-9.16.23-11.el9_2.2.x86_64.rpm
bind-license-9.16.23-11.el9_2.2.noarch.rpm
bind-utils-9.16.23-11.el9_2.2.x86_64.rpm
python3-bind-9.16.23-11.el9_2.2.noarch.rpm

Looking at my IPA systems, they are fully updated and they do not exhibit this behavior.

Did you do a full update or did you cherry pick updates? When you updated from 9.2 to 9.3, did you reboot the system?

full update 9.2 to 9.3 and after restart the system behave this way

It seems to be complaining about “hp max threads”, check any config related to threading. Why does it say “extended support version”?

Apr 23 18:39:26 ipa01 named[5121]: starting BIND 9.16.23-RH (Extended Support Version) <id:fde3b1f>

That’s just what it reports and is normal in our packages.

Just to add to this, the upgrade broke bind for me aswell:

Apr 28 04:21:35 ipa named[2891]: …/…/…/lib/isc/hp.c:156: INSIST(tid() < isc__hp_max_threads) failed, back trace
Apr 28 04:21:35 ipa named[2891]: #0 0x555c7ae04621 in ??
Apr 28 04:21:35 ipa named[2891]: #1 0x7ffaea38b4e0 in ??
Apr 28 04:21:35 ipa named[2891]: #2 0x7ffaea38fa4d in ??
Apr 28 04:21:35 ipa named[2891]: #3 0x7ffaea3bc99c in ??
Apr 28 04:21:35 ipa named[2891]: #4 0x7ffaea3a7d15 in ??
Apr 28 04:21:35 ipa named[2891]: #5 0x7ffaea3c50fc in ??
Apr 28 04:21:35 ipa named[2891]: #6 0x7ffae5c7eb93 in ??
Apr 28 04:21:35 ipa named[2891]: #7 0x7ffae5c7f354 in ??
Apr 28 04:21:35 ipa named[2891]: #8 0x7ffae5c40bfa in ??
Apr 28 04:21:35 ipa named[2891]: #9 0x7ffae5c41943 in ??
Apr 28 04:21:35 ipa named[2891]: #10 0x7ffae5c80585 in ??
Apr 28 04:21:35 ipa named[2891]: #11 0x7ffae5c80998 in ??
Apr 28 04:21:35 ipa named[2891]: #12 0x7ffaea3c468a in ??
Apr 28 04:21:35 ipa named[2891]: #13 0x7ffae989f802 in ??
Apr 28 04:21:35 ipa named[2891]: #14 0x7ffae983f450 in ??
Apr 28 04:21:35 ipa named[2891]: exiting (due to assertion failure)

Can you provide any info on downgrading to these?
dnf downgrade bind
gives me:

Package bind of lowest version already installed, cannot downgrade it.

The system does not provide a direct path to downgrade.
I’ve look in the repos the previous version and manually download it.
Then aplly the downgrade.

I’m still failing to reproduce this issue in at least three IPA domains. In the Rocky Linux infrastructure for example, our IPA and bind versions are at the latest as provided by 9.3. In 9.4, the versions will upgrade again.

It would be helpful to know the hardware specs of these systems, the configuration that has been done (such as hardening, e.g. CIS benchmarks), and other potential changes. I would ask to install sos and run sos report, but the forums do not allow uploading large archives like that.

i got the same error message on my IPA cluster on rocky9 docker container with archlinux as docker host.
im using freeipa/freeipa-server:rocky-9 which got updated few days ago

see Docker

After a bit of testing i found out bind does start after few tries, sometimes i need to try 1-2 times sometimes more like 8-10 times to get bind started, if it does not start i get the same error message.
this was also the solution to get ipa updated, starting a paralel shell making sure bind is running while update.
After the update i noticed that my IPA instance does not start in the container as it should. so after i started the container, i need to exec a bash and start bind manually, than start IPA again. after that my IPA is up and running fine, but its not reboot/restart proof.

affected bind packages:
[root@ipa1 /]# rpm -qa | grep ^bind
bind-license-9.16.23-14.el9_3.4.noarch
bind-libs-9.16.23-14.el9_3.4.x86_64
bind-utils-9.16.23-14.el9_3.4.x86_64
bind-dnssec-doc-9.16.23-14.el9_3.4.noarch
bind-dnssec-utils-9.16.23-14.el9_3.4.x86_64
bind-9.16.23-14.el9_3.4.x86_64
bind-dyndb-ldap-11.9-8.el9_3.3.x86_64
[root@ipa1 /]#

I am on Almalinux and also see this problem.

It started with the upgrade to bind-9.16.23-14.el9_3.4.x86_64.rpm and bind-dyndb-ldap-11.9-8.el9_3.3.alma.1.x86_64.rpm, works with bind-9.16.23-14.el9_3.x86_64.rpm and bind-dyndb-ldap-11.9-8.el9_1.x86_64.rpm. The interesting part is that it only fails on one system (out of 5 VMs and 2 on real HW). And that VM where it fails has ONLY ONE CPU. All other systems have at least two cores.

So I tried to lower number of VCPUs on one of the other VMs and here it goes - it crashed after the reboot, when starting named. So I guess it is somehow related to the number of available cores.

1 Like

Thank you for noting how many CPU’s you attempted to use or systems that were crashing with just 1 CPU. I asked for system specs from the OP and others in this thread and haven’t received anything, so this nails down an important thing that anyone running IPA in general should know about hardware resources required.

This should’ve been the part that rang a bell.

The official documentation for IPA does not talk about CPU cores, it talks about RAM (4GB of RAM, at a minimum) and storage (10GB of storage minimum). Given that there are several key components in an IPA installation (bind + bind’s dyndb ldap plugin, dogtag CA which uses java, krb5kdc, 389DS), it stands to reason that 1 CPU is simply not enough. I have always recommended at least 2 CPU’s for this very reason.

If you have at least 2 CPU cores and are still having this issue, please let us know and we can continue to troubleshoot. Please note that we are also planning to push 9.4 soon.

1 Like

Had the same issue.
Reinstalled a test environment with Hyper-V on windows 11, no problems there. So back to my ESXi 7.02 server. And reinstalled a fresh, 9.4 rocky server and installed a replica. Same issue as described in this topic.

So updated to 2 CPU’s, revert to a previos snapshot of my server and reinstalled the replica again. This time, no problems at all. I installed the replica server, CA and DNS seperate from each other, so it was easier to revert the snapshots to a specific part of the installation.

Hope this helps a little, looks like an issue between VMWare ESXi, the DNS install and the number of CPU’s.
My server had 8 GB RAM so no complaints about the memory usage.