Hi guys ^^,
I’m glad to post here, because I’ve read this forum many times, and helped me a lot (I am slightly new to Rocky 9.3, but years in CentOS 8.4).
This is the error I want to discuss with you, I’ll try to explain it as best I can.
I have a DAQ by PCIe that I need to reboot the OS to turn it on so I use a “.service” to do it.
This is the error I see, never happens after a Reboot, has to be after a Power On, but not happened always.
However, I don’t fully understand and solve the problem. It seems the problem is related to Kdump and Rocky 9, on the first post somebody talks about “Docs”, Where are these docs?
I have disabled Kdump.service, but I saw the error again. Can solve it by directly avoiding Kdump on the OS installation?
I’m running into the same issue using the RockyLinux 9 AMI on AWS.
[rocky@i-0978b586c4c18d4f2 ~]$ sudo systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● kdump.service loaded failed failed Crash recovery kernel arming
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
[rocky@i-0978b586c4c18d4f2 ~]$ sudo systemctl status kdump.service
× kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: enabled)
Active: failed (Result: exit-code) since Tue 2024-06-18 15:47:44 UTC; 2min 59s ago
Main PID: 1001 (code=exited, status=1/FAILURE)
CPU: 40ms
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal kdumpctl[1007]: kdump: No memory reserved for crash kernel
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal kdumpctl[1007]: kdump: Starting kdump: [FAILED]
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal systemd[1]: kdump.service: Failed with result 'exit-code'.
Jun 18 15:47:44 i-0978b586c4c18d4f2.eu-west-1.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
[rocky@i-0978b586c4c18d4f2 ~]$
This is how I look up the AMI:
data "aws_ami" "rocky_linux_9" {
most_recent = true
filter {
name = "name"
values = ["Rocky-9-*x86_64-*"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
owners = ["679593333241"]
}
The actual AMI I get with this is
AMI ID: ami-0cb9745e56da171c2
AMI Name: Rocky-9-EC2-LVM-9.4-20240523.0.x86_64-prod-hyj6jp3bki4bm
I’m not sure how to proceed. Appreciate your help!
Now I realize also why I had to adjust my cloud-init, there’s some change in between Rocky 9.3 and 9.4 as well in the block device layout. Is this intentional? (Diff old is 9.3, new is 9.4)
I had tested disabling the kdump however, I haven’t tested “erase” it: grubby --update-kernel=ALL --args=“crashkernel=no” grub2-mkconfig -o /boot/grub2/grub.cfg
The error changed, and I saw that it related to my “reboot”.
I have a .service file to do a reboot for a problem on the power of a DAQ PCI board. It seems that the reboot after checking the board is bad timing.
But I’m testing adding 5-10s before “reboot” command on the .service, and it seems the failure disappeared
Still on test
It is not failing anymore.
My solution was to use a “sleep 5;” in the X.service I was using to reboot the Pc. Something in the timing of the command to reboot inside a .service was failing.