Hello,
after upgrading to Rocky Linux 9.3, the kernel loses one of the cores in my systems.
I have 25 identical systems with Gigabyte R183-Z90-AAD1 and 2 AMD EPYC 9554 64-core processors. In total there should be 2642=256 hardware threads / logical cores. With the kernel in Rocky Linux 9.2 this degree of parallelism was detected, reported and could be used.
After upgrading to Rocky Linux 9.3, I see the following on all 25 systems:
prompt> journalctl -k
…
localhost kernel: smpboot: Allowing 256 CPUs, 0 hotplug CPUs
…
localhost kernel: smpboot: do_boot_cpu failed(-1) to wakeup CPU#128
…
The systems tells me that 1 (logical) processor is offline (sorry, the following output is partially in german):
prompt> lscpu
Architektur: x86_64
CPU Operationsmodus: 32-bit, 64-bit
Adressgrößen: 52 bits physical, 57 bits virtual
Byte-Reihenfolge: Little Endian
CPU(s): 256
Liste der Online-CPU(s): 0-127,129-255
Liste der Offline-CPU(s): 128 <------------
Anbieterkennung: AuthenticAMD
BIOS-Anbieterkennung: Advanced Micro Devices, Inc.
Modellname: AMD EPYC 9554 64-Core Processor
BIOS-Modellname: AMD EPYC 9554 64-Core Processor
…
It is not possible to turn this CPU#128 to online:
prompt> echo 1 > /sys/devices/system/cpu/cpu128/online
-bash: echo: write error: I/O error.
Any Hints?
Best regards,
Rudi