We’re running Rocky 8.7 on Fujitsu A64FX (ARMv8) and trying to benchmark with the HPCToolkit, which assumes perf counter support. The perf support is stock.
20:41:25 root@compute104.godzilla:~ # grep -i version /etc/os-release
VERSION="8.7 (Green Obsidian)"
VERSION_ID="8.7"
ROCKY_SUPPORT_PRODUCT_VERSION="8.7"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"
19:54:35 root@compute104.godzilla:~ # rpm -q perf opencsd slang
perf-4.18.0-425.19.2.el8_7.aarch64
opencsd-1.0.0-4.el8.aarch64
slang-2.3.2-3.el8.aarch64
19:54:39 root@compute104.godzilla:~ # uname -a
Linux compute104 4.18.0-425.19.2.el8_7.aarch64 #1 SMP Tue Apr 4 19:39:14 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
Writing certain counters fails with 800b written (unparsable by ‘perf report’) and the cpu core associated with the task hung in soft lock on [perf-exec] (watchdog timer), which never returns and is uninterruptible. Expanding or disabling the watchdog reduces output but the core remains in soft lock for at least ~15 minutes. No other cores are impacted and the surrounding system continues responding as relatively healthy wrt memory, run queues, etc. Stacktracing with [v]child following from above perf-exec halts at perf-exec. According to the syslogs and auditlogs perf-exec never consumes cycles or changes wchan or state. The initial failure is intermittent; all failures have occurred in the first four immediately consecutive attempts with most occuring in the second attempt. This is not changed by writing to the default “perf.data” vs a named “-o” target, or by writing to NFS vs local nvme vs tmpfs vs ramfs. The minimum repeater is the combination of ‘record’ function and one of the indicated counters:
# The following can be run a minimum of 10 immediately consecutive times without issue:
perf stat -e <any combination of cpu-cycles, stalled-cycles-backend, stalled-cycles-frontend, ea_memory, ea_l2> id
perf record -e <any combination of cpu-cycles, ea_memory, ea_l2> id
# The following cause immediate soft lock on the executing core, typically but not always on the second attempt
perf record -e <any combination including stalled-cycles-frontend, stalled-cycles-backend, ea_core> id
Reading counters always succeeds. Repeat series of up to 10 consecutive ‘perf stat’, either manually and singly or via mpi at any scale/placement from 1 to 48 cores, has no failures capturing all counters.
No failures occur on AMD 7302P or Intel E5645. We do not have other ARM platforms on which to test.
Outstanding updates for the compute image as of now:
20:53:07 root@compute103.godzilla:~ # dnf update
Last metadata expiration check: 0:05:00 ago on Mon May 8 20:48:12 2023.
Dependencies resolved.
======================================================================================================================================================================================================
Package Architecture Version Repository Size
======================================================================================================================================================================================================
Upgrading:
emacs aarch64 1:26.1-7.el8_7.1 appstream 3.1 M
emacs-common aarch64 1:26.1-7.el8_7.1 appstream 38 M
emacs-filesystem noarch 1:26.1-7.el8_7.1 baseos 69 k
kmod-yfs aarch64 2021.05-27.4.18.0_425.19.2.el8_7 auristor 523 k
libwebp aarch64 1.0.0-8.el8_7 appstream 245 k
yfs aarch64 2021.05-27.el8 auristor 3.0 M
yfs-client aarch64 2021.05-27.el8 auristor 75 k
yfs-dumptools aarch64 2021.05-27.el8 auristor 45 k
yfs-fuse aarch64 2021.05-27.el8 auristor 16 k
yfs-pam aarch64 2021.05-27.el8 auristor 20 k
Transaction Summary
======================================================================================================================================================================================================
Upgrade 10 Packages
Total download size: 45 M
Is this ok [y/N]: n
Operation aborted.