Dear All,
I want to set up a small HPC installation here at our uiniversity using Rocky Linux. We have infiniband network hardware available, but I’m very new to this, thus I’m looking for a comprehensive “HowTo”, which ideally guides me through the setup process. Please, could anyone point me into the right direction or provide me with some information. The cluster is already up and running with 10GbE network, but - as mentioned - I’m completely new to the “Infiniband way” of networking. I found that I will have to choose “IP over IB”, but looking at the web I’m missing the correct/detailed information on configuring.
Hey @mdcTUX, great and thanks a lot, I did not find this via web search. However, am I right to choose IPOIB for setting up an infrastructure for Distributed Memeory setup, i.e. for MPI applications driven by slurm scheduler?
@hazel I have installed a fair few Infiniband networks in my time. Happy to help you. Please send me a message (not sure if this is possible here)
Firstly, you have a choice of using the distro supplied IB utilities and drivers or thoseporvided as a bundle by Mellanox. The Mellanox ‘OFED’ distro has more performance, but has to be installed ‘by hand’.
ALso IPOIB is the mecahnism for running IP traffic over Infiniband. It is not strictly necessary- IB has its own ways of working. However you should definitely set it up.
You say you want to set up HPC. There is a layer above Rocky Linux - the cluster management and deployment stack. There are choices here also.
Also do you know the hardware details of
your servers
Infiniband NIC cards
cables
(the cables dont really matter, however worth pointing out that cables should not be bent excessively in an IB setup)
We can go over some of the diagnostics which show the state of the IB network.
First question here - you should have a Subnet Mananger runnig either on an Infiniband switch or or a server.
Send the output of:
ofed_info (to show your OFED software stack)
sminfo (to show the subnet manager)
ibhosts (which should list the hostsseen on the network)
Messages (PM/DM) are possible to be sent here, although it would serve the community better if the posts helping set up the infiniband are public in a forum post like this, since it could help others who may wish to do the same thing in the future
No problems Ian. @hazel IPOIB setup is not necessary for the operation of an Infiniband network.
However iti s very useful…
When an Infiniband host comes up it is recognised by the Subnet Manager and is assigned an address called a Local Identifier (LID)
Choose one of your servers and run these diagnostic commands:
ibstat
ibstatus
On your cluster head node (or any server really) run ‘ibdiagnet’
This will produce a LOT of output.
Do you know what cluster management stack is in use here?
You could be using Warewulf, XCAT or Bright - or maybe something else!
The Subnet Manager really is key here. You should either configure an Infiniband switch to run an SM, or to perhaps run this on the cluster head node.
It is a good idea to have a secondary SM ready to take over, but lets get the primary one sorted out first.
BTW I am in an EMEA timezone, so theremay be a bit of a lag involved.
Hey @HPCJohn and @iwalker, thanks a lot for your messages. I was absorbed by another issue, but I had a look into the above mentioned PDF. At first, following the installation routines covered there, I have a question: do I need to setup everything (also the RDMA things) as mentioned there?
At the moment I can provide you with the following informations: there are Mellanox cards (e.g. Mellanox Technologies MT27800 Family [ConnectX-5]) in the machines, I can see the ib0 interface when calling ‘ifconfig’, but the above mentioned commands (ibstat, ibstatus, ibhosts, sminfo, ofed_info) are not available, since - I think so - the whole ‘infiniband stuff’ is not correctly installed.
I plan to do the “basic” installation at the head node first. Second, I plan to do the (more or less) same installation on one of the nodes, followed by testing. If everything went well, then follow up with the remaining nodes.
FYI: we use a mix of commercial software (e.g. ANSYS, ABAQUS, …), open-source (OpenFOAM, SU2, …) and inhouse-codes (parallelisation done with MPI/OpenMP). Together with the setup of the infiniband network, I will have to install SLURM as scheduler. Since we’re talking about a small cluster of our research group (in sum about 512 cores and 4TB RAM, … constantly growing), we only use C3 scripts, but none of the more evolved tools mentioned above.
BTW: I’m also in EMEA, but since this topic is something I have to do with rather low priority beside many other stuff, there might also be some “lag” on my side
I am going to take a guess here and I think that you are using the distribution-supplied OFED (Infiniband) stack.
HOWEVER it seems strange to me that you have an ib0 interface present, but ibstatus does not work.
Also let us for the moment assume that there is a Subnet Manager running on the Infiniband switch.
Can you access thr switch using a web browser?
There are manaaged and unmanaged switches !
Assuming that all hosts can be reached with network links and there is Ansible inventory that lists them with IP address (ib_ip_addr) and name of IB interface (ib_nic) for each of them, then an Ansible play like below (that makes use of the rhel-system-roles.network) should configure the IPoIB:
My thoughts are that as the Infiniband hardware is already in place then IB is probably working already.
TO find out what the status is we need to run some diagnostics, then maybe run an MPI program with a verbose flag which will tell us how the job is being set up (ie the transport layer)
First off can we find the cluster management which is in operation here?
Do you get any message on the console when you log in?
If I said Bright, OpenHPC, Warewulf, XCAT would any of those be recogniseable?
Was the cluster installed by a company?
Ok, the Infiniband hardware was installed lately, thus the OS was setup without the Mellanox cards… I didn’t knew, that I can just do dnf group install "Infiniband Support" I will check this, and come back to you asap
We cannot use newer kernel for software compatibility reasons. If I list the available groups
dnf group list
Letzte Prüfung auf abgelaufene Metadaten: vor 2:25:28 am Mo 15 Apr 2024 06:02:53 CEST.
Verfügbare Arbeitsumgebungs-Gruppen:
Server mit GUI
Minimale Installation
Arbeitsplatz
KDE Plasma-Arbeitsumgebung
Virtualisierungs-Host
Benutzerdefiniertes Betriebssystem
Installierte Arbeitsumgebungs-Gruppen:
Server
Installierte Gruppen:
Kopfloses Management
Xfce
Verfügbare Gruppen:
Containerverwaltung
.NET Core Entwicklung
RPM Entwicklungswerkzeuge
Entwicklungswerkzeuge
Grafische Administrations-Tools
Kompatibilität mit älteren UNIX-Systemen
Netzwerk-Server
Unterstützung für Wissenschaft
Sicherheits-Tools
Smart-Card-Unterstützung
Systemwerkzeuge
Fedora-Paket-Ersteller
$ nmcli device
DEVICE TYPE STATE CONNECTION
bond0 bond verbunden bond0
eno1 ethernet verbunden bond0_p0
eno2 ethernet verbunden bond0_p1
ib0 infiniband nicht verbunden --
lo loopback nicht verwaltet --
$ nmcli con sh
NAME UUID TYPE DEVICE
bond0 ad33d8b0-1f7b-cab9-9447-ba07f855b143 bond bond0
bond0_p0 37b1c06d-676d-4965-8e0d-75f825068948 ethernet eno1
bond0_p1 368b6d67-b04d-4fa2-8da2-627cdb49ea14 ethernet eno2
ib0 4e381896-c6ac-42d9-ba4f-1f082463b168 infiniband --