I saw this happen in real life this week. A server (not built by me) became unresponsive, ping was woring fine, but ssh took about 15 seconds to log in.
After login, I could see load avg of 226, low memory and heavy swapping, cpu was not that high, about 75%. The oom-killer was not activating. The server was effectively “stuck”.
After about 30 minutes I did see oom-killer activate, but it killed a tiny process and new ones were being created, so the server was still “stuck”.
Stopping httpd, returned everything to normal, after starting httpd again, it ran fine for days without the issue.
In this case, the cause was bad application design and poor server configuration. The application spawns processes that do some kind of image manipulation and are resource intensive, but the server was set to just allow anything (no limits set).
The server was a RHEL 9.5 vm guest running on vmware with 4Gb memory and 2Gb swap.
So why didn’t oom-killer save the day?
I don’t know, but there are many factors, bare metal vs vm guest, ratio of RAM to SWAP, types of processes, o/s settings like swappiness, and anyway just killing some random process doesn’t seem very helpful anyway.
This was a server not built by me; I usually set limits on resources, and therefore don’t see this type of problem.