I have a small HPC cluster running under Rocky Linux 8.5.
When I try to connect via ssh between the different nodes, the connection is sometimes extremely slow, especially when the node’s CPU is at 100% usage (it takes about 30-60 seconds to connect, even if the node is completely free the connection takes several seconds).
I’m sure it’s not a network problem, we have a dedicated 10G LAN network.
It seems that high CPU usage may be the problem but our old cluster based on CentOS 6.5 does not have this problem, even if the CPU is at 100% the connection is instantaneous.
Does anyone have any ideas or suggestions on how to solve this?