We have an RL9 VM that has 64G RAM and a convert process that routinely runs on it. Occasionally it converts a number of large files and calls for a lot of memory. At the point that it hits 32G the oom-killer kills the process and the process that called it. For the record this VM routinely runs low on it’s RAM, staying around 5-12G with the occasional spike due to convert demanding a ton of RAM.
I have found how to alter the files in /proc on the fly during run time for that specific process however when convert is done and the process exits this won’t help us when the next convert process is called with a new PID. Maybe I’m missing something but I cannot seem to find a way to set oom-killer for a specifc, dynamically called process and I do not want to turn off oom-killer entirely. Does anyone know of a way to set oom-killer for “convert” to give more room before killing it? Something that also survives reboots?
For the record this VM routinely runs low on it’s RAM, staying around 5-12G with the occasional spike due to convert demanding a ton of RAM.
5-12GB free without the conversion process? How do you hope to fit a process that may consume up to 32GB in those conditions? It seems to me that the oom killer isn’t an impediment here, but something that actively tries to prevent an OS crash due to insufficient memory. If you supress oom killer from killing your conversion process, it may kill something else. The solution is to increase memory.
As for dynamically adjusting oom score, you can wrap the execution with a script such as:
#!/bin/bash
/path/to/your/process &
PID=$!
echo -1000 > /proc/$PID/oom_score_adj
wait $PID
2 Likes
Sorry, should have been more clear about the running memory. This VM has 64G, dynamic memory assigned. It routinely runs with low RAM use, around that 5-12 mark. Unless this process kicks off it almost never uses close to the 64G available.
I will try that out, thanks!
dynamic memory assigned. It routinely runs with low RAM use, around that 5-12 mark.
My bad, I misunderstood: you meant that the VM has plenty of available RAM (e.g. 64- {between 5 and 12}), not that it only has the 5-12 GB of RAM free.
Still, if the resident set size (RSS) of any process gets to beyond what can be accommodated by the kernel after releasing everything (caches, reclaimable memory), then oom-killer goes in for the kill. If it doesn’t kill your (very memory-hungry) conversion process due to score hacking, it will kill something else.
Convert probably reads the whole image into memory and then tries to convert it; try limiting image sizes. If more than one convert can run at a time, try to limit the number of instances. Look at getting a different convert program that is more efficient. The issue with oom-killer is “over commit”, it allocates memory that doesn’t exist, then runs out.
Original post says 64, then 32.
I’m guessing you have not messed with any settings, but please check
sysctl vm.overcommit_memory
The VM is set for 64G and the process is killed when the process reaches 32G.
When I run that command here is the output: vm.overcommit_memory = 0
OK, your setting for ‘vm.overcommit_memory’ is the default on RHEL9, not sure about RHEL7 and RHEL8, I think it may have changed?
Doc here (kind of) explains the different settings
https://www.kernel.org/doc/html/v5.1/vm/overcommit-accounting.html
The default for ‘vm.overcommit_ratio’ is 50 which made me look at the ratio of 64 to 32.
Then ‘man 5 proc’ covers how to monitor:
CommitLimit
Committed_AS
but you need to do this while it’s heavily loaded
In Addition:
RHEL 9.6 release notes
Chapter 4. Important changes to external kernel parameters
Changed sysctl parameters
sysctl overcommit_memory
BUT, what has actually changed?