Rockylinux9 is stuck

Which is what I’ve hinted at in my posts above and the link from RH Customer Portal. But instead of fixing the program code to work with the system, he’s trying to fix the system to work with his code :slight_smile:

If the system doesn’t have enough resources to handle the code in the program, it probably needs more ram which I hinted at as well. Obviously though cannot keep adding ram all the time, but since Rocky works in pretty much all scenarios except his code, would suggest his program code is the problem.

I executed this script of yours and it stuck my machine for a long time, about 10 minutes or so, and then was killed by oom

OK, this is expected, it means your ‘oom’ is working, but the machine should not be completely unresponsive, and it should not take ten minutes.

It probably means your machine is configured in a different way, things like hardware config, swap settings, or non-default settings in Rocky.

I saw this happen in real life this week. A server (not built by me) became unresponsive, ping was woring fine, but ssh took about 15 seconds to log in.

After login, I could see load avg of 226, low memory and heavy swapping, cpu was not that high, about 75%. The oom-killer was not activating. The server was effectively “stuck”.

After about 30 minutes I did see oom-killer activate, but it killed a tiny process and new ones were being created, so the server was still “stuck”.

Stopping httpd, returned everything to normal, after starting httpd again, it ran fine for days without the issue.

In this case, the cause was bad application design and poor server configuration. The application spawns processes that do some kind of image manipulation and are resource intensive, but the server was set to just allow anything (no limits set).

The server was a RHEL 9.5 vm guest running on vmware with 4Gb memory and 2Gb swap.

So why didn’t oom-killer save the day?

I don’t know, but there are many factors, bare metal vs vm guest, ratio of RAM to SWAP, types of processes, o/s settings like swappiness, and anyway just killing some random process doesn’t seem very helpful anyway.

This was a server not built by me; I usually set limits on resources, and therefore don’t see this type of problem.