Infiniband Network HowTo

Since the network infrastructure (e.g. switches) is managed by someone else, I have no access to the IB-switch.

How are these systems being installed?
I woudl highly advise you to look at a cluster management framework such as
Qlustar
Dell Omnia
Bright
OpenHPC https://openhpc.community/
Warewulf CIQ | High Performance Computing

I apologise for throwing a list like that on here, Please be aware that if you are deploying these systems ‘by hand’ that there are frameworks out there which can be used to deploy and monitor an HPC cluster.
If we can find out more about the hardware you are using and the company whoi supplied it then maybe we can say which framework is better for you.

Qlustar https://qlustar.com/
Omnia Omnia: Everything at once! — Dell/Omnia

@HPCJohn sorry, but indeed, since we cannot afford hiring a professional/experienced sysadmin in our research group, we are doing our best to setup our rather small cluster “by hand”. I absolutely agree, that using some cluster management tool will improve the overall situation. I’m doing the administration here part of my time, and I’m not even a sysadmin guy, just part of the research group. I’m aware, that the situation is - kindly speaking - sub-optimal, but there is no alternative yet. I apologise for the invonvenience, I just tried to get some help/support. I will have alook at the tools you mentioned.

We are mainly using HP Proliant Servers and two older supermicro servers, kind of “poor-mans-cluster” :wink: not bought at once. Thus the current infrastructure has somehow evolved/grown.

For the Cluster Management tools, what would you recommend to start with, considering limited time available for going into it very deep?

I would suggest that you join the community at HPC Social https://hpc.social/
Everyone there is friendly. Join the Slack discussion hpcsocial.slack.com

There is a community of HPC people with a background in science and engineering who can help.

For the cluster management stack it is difficult to make a decision quickly.
For the cluster management :
You seem to be in Germany - contact Qlustar https://qlustar.com/

Look at OpenHPC 2.X · openhpc/ohpc Wiki · GitHub

I will now make things more complicated… again apologies.
If you put the BMC/IPMI management interfaces of your servers onto a network you will save a lot of time and effort and also will be able to control and monitor your servers.
There is a separate management interface which you can connect to a cheap 1 Gbps switch. Any switch, maybe a retired switch in the store cupboard, will do the job.
HP call the management processors ‘ILO’ and Supermicro call them IPMI
You can use the main ethernet port to connect to ILO/IPMI - this is a BIOS setting.

Yes that’s right - the servers will have three interfaces:
ILO/IPMI
Main ethernet
Infiniband

Welcome to HPC :slight_smile: