OpenFabrics ibutils package and ibdiagnet utility

Don’t see an ibutils package in either Rocky 8.6 or 9, or EPEL 8 or 9, and no ibdiagnet substring in any associated mirror/repo meta. OpenFabrics still bundles it as “ibutils” and “ibdiagnet”, which is how it still appears under all non-RHEL-derived bundles I can think of. Where did it get renamed/repackaged/migrated for EL8 in the RHEL world? Yes, “ibutils” as a package is long depricated, but whether you’re running a native IB fabric with IPoIB and RDMAoIB or you’re running ROCE you still need something to diag credit loops on the fabric, assemble CA/SW and SW/SW path capabilities, produce the fabric qualities report, etc. At least, if you want to get anywhere close to minmax-ing the fabric you do…

Please tell me NVidia isn’t the only place left to go for ibdiagnet…

Chapter 7 of https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/pdf/configuring_infiniband_and_rdma_networks/red_hat_enterprise_linux-8-configuring_infiniband_and_rdma_networks-en-us.pdf covers doing this with utilities from the infiniband-diags package. That package doesn’t seem to provide an actual ibdiagnet, but perhaps it has what you need?

Nope, that’s just “is *verbs and *cm and point-to-point ib and rdma working”, it’s the connectivity tests for ib frames, rdma over ib, with and without the ib connection manager agent, with a few related capabilities tossed in (but mostly not ipoib, that’s tested by normal ip tools once the underlying ib is working). Imagine tools to test IP-encapsulated protocol connectivity – icmp ping, nmap, etc – but translated from IP to IB, that’s the point of infiniband-diags. The legacy ibdiagnet, together with ibnetdiscover, are like triage tools for L4+ switch flow control protocols across converged fabrics, but for IB instead of IP.

I can’t find it anywhere else. Nvida seems like the only place, InfiniBand Management Tools | NVIDIA

Perhaps we could add it to the Rocky Linux HPC SIG so that there’s a proper repo for it.

Looking at the canonical NVidia mlnx_ofed, will report back.

@brian I think you’re right, looks like OpenFabrics has pretty much dried up wrt ibutils. NVidia does provide a wget’able repo but like their linux gpu driver repos there’s no rsync and it’s not exactly repodef-standard. But it does include the sources now. Don’t know if it’s worth the effort on Rocky’s side?

The community could really use more IB vendors, thus far nobody’s come close to filling the old Mellanox shoes.

Can confirm nvidia mlnx_ofed ibutils and ibdiagnet is compatible with the stock Rocky 8.6 IB stack for the following functions (edge cases unknown but this main stuff, yes):

  • reads SM cached HBA, port, transceiver, SW models
  • reads above via device query, cross-compares correctly
  • picks up extended port attrs and correctly matches VS/AR info
  • guid de-dup good (note untested edge cases for unmanaged CAs)
  • MAD, firmware, pkey, rm/sm/chan queries good

There’s the usual NVidia issues with package signing keys, and the duplication of package names (repo priority should still work fine for that). Probably not kosher to mix legacy and nvidia ofeds, that’ll come back to bite me, but mixing in nvidia ibdiagnet/ibnetdiscover tools with Rocky IB [u]mad, cm, sm, and *verbs works “good enough(?)” for the major use cases.

repo: nvidia: Index of /public/repo/mlnx_ofed/latest/rhel8.6/x86_64
tools: nvidia: ibutils2-2.1.1-0.151.MLNX20220720.gcd746c3.57102.x86_64
drivers: rocky: kernel-{core,modules}-4.18.0-372.19.1.el8_6.x86_64
*cm: rocky: [lib]ibacm-[utils-]37.2-1.el8.x86_64, ditto verbs
sm: rocky: opensm-3.3.24-1.el8.x86_64