Hello,
We are testing Rocky on a Grace Hopper (GH200) server.
When attempting to load the nvidia-peermem
kernel module via modprobe
, we encountered an error related to a mismatch of kernel symbol.
-
Specs:
- OS: Rocky Linux 9.3 (Blue Onyx)
- Kernel: 5.14.0-362.8.1.el9_3.aarch64+64k
-
modprobe
$ sudo modprobe nvidia-peermem modprobe: ERROR: could not insert 'nvidia_peermem': Invalid argument
-
dmesg
[348143.219153] nvidia_peermem: disagrees about version of symbol ib_register_peer_memory_client [348143.219156] nvidia_peermem: Unknown symbol ib_register_peer_memory_client (err -22)
-
kysm
$ cat /proc/kallsyms | grep ib_register_peer_memory_client 0000000000000000 r __kstrtab_ib_register_peer_memory_client [ib_core] 0000000000000000 r __kstrtabns_ib_register_peer_memory_client [ib_core] 0000000000000000 r __ksymtab_ib_register_peer_memory_client [ib_core] 0000000000000000 T ib_register_peer_memory_client [ib_core]
So the kernel does provide symbol of ib_register_peer_memory_client
.
We are not sure how to pin point the origin of version mismatch.
It also seems that it is not possible obtain the version string for diagnostic purpose.
We much appreciate your insights on this issue.
Thanks.