Infiniband Network HowTo

Dear All,
I want to set up a small HPC installation here at our uiniversity using Rocky Linux. We have infiniband network hardware available, but I’m very new to this, thus I’m looking for a comprehensive “HowTo”, which ideally guides me through the setup process. Please, could anyone point me into the right direction or provide me with some information. The cluster is already up and running with 10GbE network, but - as mentioned - I’m completely new to the “Infiniband way” of networking. I found that I will have to choose “IP over IB”, but looking at the web I’m missing the correct/detailed information on configuring.

Any help/guidance appreciated, thanks in advance!

Will this PDF helps?

Hey @mdcTUX, great and thanks a lot, I did not find this via web search. However, am I right to choose IPOIB for setting up an infrastructure for Distributed Memeory setup, i.e. for MPI applications driven by slurm scheduler?

Hi @hazel herer I think you better wait for someone which has done this before. I self have never set up any Infiniband setup.

@hazel I have installed a fair few Infiniband networks in my time. Happy to help you. Please send me a message (not sure if this is possible here)
Firstly, you have a choice of using the distro supplied IB utilities and drivers or thoseporvided as a bundle by Mellanox. The Mellanox ‘OFED’ distro has more performance, but has to be installed ‘by hand’.
ALso IPOIB is the mecahnism for running IP traffic over Infiniband. It is not strictly necessary- IB has its own ways of working. However you should definitely set it up.

You say you want to set up HPC. There is a layer above Rocky Linux - the cluster management and deployment stack. There are choices here also.
Also do you know the hardware details of
your servers
Infiniband NIC cards
cables

(the cables dont really matter, however worth pointing out that cables should not be bent excessively in an IB setup)

We can go over some of the diagnostics which show the state of the IB network.
First question here - you should have a Subnet Mananger runnig either on an Infiniband switch or or a server.
Send the output of:
ofed_info (to show your OFED software stack)
sminfo (to show the subnet manager)
ibhosts (which should list the hostsseen on the network)

Messages (PM/DM) are possible to be sent here, although it would serve the community better if the posts helping set up the infiniband are public in a forum post like this, since it could help others who may wish to do the same thing in the future :slight_smile:

1 Like

No problems Ian.
@hazel IPOIB setup is not necessary for the operation of an Infiniband network.
However iti s very useful…
When an Infiniband host comes up it is recognised by the Subnet Manager and is assigned an address called a Local Identifier (LID)
Choose one of your servers and run these diagnostic commands:
ibstat
ibstatus

On your cluster head node (or any server really) run ‘ibdiagnet’
This will produce a LOT of output.

Do you know what cluster management stack is in use here?
You could be using Warewulf, XCAT or Bright - or maybe something else!

1 Like

The Subnet Manager really is key here. You should either configure an Infiniband switch to run an SM, or to perhaps run this on the cluster head node.
It is a good idea to have a secondary SM ready to take over, but lets get the primary one sorted out first.

BTW I am in an EMEA timezone, so theremay be a bit of a lag involved.

Hey @HPCJohn and @iwalker, thanks a lot for your messages. I was absorbed by another issue, but I had a look into the above mentioned PDF. At first, following the installation routines covered there, I have a question: do I need to setup everything (also the RDMA things) as mentioned there?

At the moment I can provide you with the following informations: there are Mellanox cards (e.g. Mellanox Technologies MT27800 Family [ConnectX-5]) in the machines, I can see the ib0 interface when calling ‘ifconfig’, but the above mentioned commands (ibstat, ibstatus, ibhosts, sminfo, ofed_info) are not available, since - I think so - the whole ‘infiniband stuff’ is not correctly installed.

I plan to do the “basic” installation at the head node first. Second, I plan to do the (more or less) same installation on one of the nodes, followed by testing. If everything went well, then follow up with the remaining nodes.

FYI: we use a mix of commercial software (e.g. ANSYS, ABAQUS, …), open-source (OpenFOAM, SU2, …) and inhouse-codes (parallelisation done with MPI/OpenMP). Together with the setup of the infiniband network, I will have to install SLURM as scheduler. Since we’re talking about a small cluster of our research group (in sum about 512 cores and 4TB RAM, … constantly growing), we only use C3 scripts, but none of the more evolved tools mentioned above.

BTW: I’m also in EMEA, but since this topic is something I have to do with rather low priority beside many other stuff, there might also be some “lag” on my side :wink:

I am going to take a guess here and I think that you are using the distribution-supplied OFED (Infiniband) stack.
HOWEVER it seems strange to me that you have an ib0 interface present, but ibstatus does not work.

What happens when you run

dnf group install “InfiniBand Support”

Also let us for the moment assume that there is a Subnet Manager running on the Infiniband switch.
Can you access thr switch using a web browser?
There are manaaged and unmanaged switches !

Assuming that all hosts can be reached with network links and there is Ansible inventory that lists them with IP address (ib_ip_addr) and name of IB interface (ib_nic) for each of them, then an Ansible play like below (that makes use of the rhel-system-roles.network) should configure the IPoIB:

- hosts: all
  vars:
    ip_gateway: 10.20.30.254
    network_connections:
    - name: ib0
      type: infiniband
      interface_name: "{{ ib_nic }}"
      ip:
        dhcp4: no
        ipv6_disabled: true
        gateway4: "{{ ib_gateway }}"
        address:
        - "{{ ib_ip_addr }}/24"
        dns:
        - "{{ ib_gateway }}"
        zone: trusted
  tasks:
  - name: Install Infiniband utilities
    vars:
      __dnf_group_infiniband:
      # Mandatory
      - libibverbs
      - libibverbs-utils
      - librdmacm
      - librdmacm-utils
      - rdma-core
      # Default
      - ibacm
      - infiniband-diags
      - iwpmd
      - libibumad
      - mstflint
      - perftest
      - srp_daemon
    ansible.builtin.dnf:
      name: "{{ __dnf_group_infiniband }}"
      state: present
    when:
    - ansible_distribution_major_version is version( '9', '==' )
    - ib_ip_addr is defined

  - ansible.builtin.include_role:
      name: rhel-system-roles.network
    when:
    - network_connections is defined
    - network_connections|length > 0

Note that I set the connection with FirewallD zone trusted as I assume SLURM traffic to prefer that …

My thoughts are that as the Infiniband hardware is already in place then IB is probably working already.
TO find out what the status is we need to run some diagnostics, then maybe run an MPI program with a verbose flag which will tell us how the job is being set up (ie the transport layer)

First off can we find the cluster management which is in operation here?
Do you get any message on the console when you log in?

If I said Bright, OpenHPC, Warewulf, XCAT would any of those be recogniseable?
Was the cluster installed by a company?

Ok, the Infiniband hardware was installed lately, thus the OS was setup without the Mellanox cards… I didn’t knew, that I can just do dnf group install "Infiniband Support" I will check this, and come back to you asap

Ok, unfortunately I cannot find the grop “Infiniband Support”. I’m running Rocky 8.8

$ cat /etc/os-release 
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

We cannot use newer kernel for software compatibility reasons. If I list the available groups

dnf group list
Letzte Prüfung auf abgelaufene Metadaten: vor 2:25:28 am Mo 15 Apr 2024 06:02:53 CEST.
Verfügbare Arbeitsumgebungs-Gruppen:
   Server mit GUI
   Minimale Installation
   Arbeitsplatz
   KDE Plasma-Arbeitsumgebung
   Virtualisierungs-Host
   Benutzerdefiniertes Betriebssystem
Installierte Arbeitsumgebungs-Gruppen:
   Server
Installierte Gruppen:
   Kopfloses Management
   Xfce
Verfügbare Gruppen:
   Containerverwaltung
   .NET Core Entwicklung
   RPM Entwicklungswerkzeuge
   Entwicklungswerkzeuge
   Grafische Administrations-Tools
   Kompatibilität mit älteren UNIX-Systemen
   Netzwerk-Server
   Unterstützung für Wissenschaft
   Sicherheits-Tools
   Smart-Card-Unterstützung
   Systemwerkzeuge
   Fedora-Paket-Ersteller

I cannot find any “Infiniband” related list.

However, I can see the device…

$ nmcli device
DEVICE  TYPE        STATE            CONNECTION 
bond0   bond        verbunden        bond0      
eno1    ethernet    verbunden        bond0_p0   
eno2    ethernet    verbunden        bond0_p1   
ib0     infiniband  nicht verbunden  --         
lo      loopback    nicht verwaltet  --         

$ nmcli con sh
NAME      UUID                                  TYPE        DEVICE 
bond0     ad33d8b0-1f7b-cab9-9447-ba07f855b143  bond        bond0  
bond0_p0  37b1c06d-676d-4965-8e0d-75f825068948  ethernet    eno1   
bond0_p1  368b6d67-b04d-4fa2-8da2-627cdb49ea14  ethernet    eno2   
ib0       4e381896-c6ac-42d9-ba4f-1f082463b168  infiniband  --     

Strange… dnf groupinstall "InfiniBand Support" did succeed and installed a couple of packages!

Now I can provide you with the Information:

$ ibstat
CA 'mlx5_0'
	CA type: MT4119
	Number of ports: 1
	Firmware version: 16.28.1002
	Hardware version: 0
	Node GUID: 0x043f720300f80b0a
	System image GUID: 0x043f720300f80b0a
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 3
		LMC: 0
		SM lid: 1
		Capability mask: 0x2659e848
		Port GUID: 0x043f720300f80b0a
		Link layer: InfiniBand
$ ibstatus 
Infiniband device 'mlx5_0' port 1 status:
	default gid:	 fe80:0000:0000:0000:043f:7203:00f8:0b0a
	base lid:	 0x3
	sm lid:		 0x1
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 100 Gb/sec (4X EDR)
	link_layer:	 InfiniBand

Some groups are hidden in the default list.

# dnf group list -v --hidden | grep -i infi
   Infiniband Support (infiniband)

Please drop me an email at hearnsj@gmail.com

Please now run

ibhosts
sminfo

ibdiagnet - no need to give the entire output of ibdiagnet