Rockylinux9 is stuck

I executed a piece of go code, constantly applying for memory, memory ran out, the system was stuck, the process was not killed。

package main

import (
	"fmt"
	"os"
	"os/signal"
	"runtime"
	"syscall"
	"time"
)

var holder = make([][]byte, 0)

func main() {

	blockSize := 100 * 1024 * 1024 // 每次分配100MB
	printInterval := time.Second

	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT)

	ticker := time.NewTicker(printInterval)
	defer ticker.Stop()

	var totalAllocated uint64

	for {
		select {
		case <-ticker.C:
			var m runtime.MemStats
			runtime.ReadMemStats(&m)
			fmt.Printf("Allocated: %.2f GB\n", float64(totalAllocated)/1024/1024/1024)

		case sig := <-sigChan:
			fmt.Printf("Received signal: %v\n", sig)
			return

		default:
			buf := make([]byte, blockSize)
			for i := range buf {
				buf[i] = byte(i % 256)
			}
			holder = append(holder, buf)
			totalAllocated += uint64(blockSize)
		}
	}
}

Most likely a problem with your code that you need to fix. Not really a problem with Rocky Linux :slight_smile:

He deliberately exhausted the memory, hoping the system would kill the process, because the following situation might occur: a Kubernetes container’s memory exceeds its limit, but the process isn’t killed as expected.

Check value of
sysctl vm.overcommit_memory

I recall the oom_killer of CentOS 4 (or 5) to target system services, rather than rogue user process, but not since.

When memory usage goes up, swap is taken into use. Swapping really slows the system to “stuck”-like state until all of RAM+swap is in use and oom_killer finally acts. Was the system “forever” stuck (oom_killer does nothing or kills vital services) or just “a long time”?


Can one “contain” the containers with cgroups? That is how, for example, SLURM limits the memory usage of jobs that it runs. Then it would be the cgroups, not the oom_killer, that kills the contained processes and system overall would not run out of memory (as long as cgroups is not given too much/all).

oom_killer doesn’t seem to do anything, because the kswapd process is occupying 100% of the cpu, causing the entire system to get stuck, is there a solution?

Answer is pretty clear here: Why is kswapd process using 100% CPU on Red Hat Enterprise Linux? - Red Hat Customer Portal

Therefore, fix your program since this is the cause of the problem. Or, increase your RAM in your machine if your program is memory hungry.

I didn’t see a reply to my post from two days ago.

Setting vm.overcommit_memory=2 can indeed solve the problem, but it may waste a part of memory. I want vm.overcommit_memory=0, so my problem is that in the scenario where vm.overcommit_memory=0, How to solve this problem in rocky9linux

Ok, at least we know now, but you didn’t say any of that before.

Does anyone know what to do

Why did centos and rocky8 test without this problem

You mean: why did they behave differently?

Different kernel and whatnot.

Can you try launching six instances at once, e.g. using things like bash background ‘&’ and ‘wait’ or some other async launcher. Your stdout will be a bit muddled unless you add pid as prefix.

yes, why did they behave differently?

He already said why they behave differently:

I’ve just tested the oom killer in a Rocky 9.5 guest.

The vm guest only has 2Gb of RAM, but you can do the same test on bigger machines by changing 800MiB to something like 2GiB or more.

When I run this script, I see memory going down, down, down and intense swapping, then I see multiple oom killer messages in dmesg. The o/s did not crash or freeze, it just killed the bad processes.

#!/bin/bash
set -u
for i in {1..10}; do
	head -c 800MiB    /dev/zero | tail | sleep 10 &
	sleep 1
done
3 Likes

Can you execute the golang code I posted above? I got 100% stuck when executing it on rocky9.2.

Why on 9.2? The only supported Rocky 9 is 9.5 (until 9.6 is released).

dnf up
systemctl reboot

and then check whether you can reproduce the issue.

No, you need to first check using the bash example above on your system because we don’t know if the issue is your system or your code. We know from the test above that oom killer is working on clean install of Rocky 9.5