Rocky Linux 9.5 sometimes boot into emergency mode for no reason

frankie · November 28, 2024, 8:07am

Hi

i have a Rocky 9.5 running on Linode that sometimes after installing updates boots into emergency mode. Its a game server thats internet facing and therefore is searching/installing system updates every day at 5 in the morning. A lot of times it does install updates and reboots after that (the script has a check reboot function) and everything is fine but sometimes for no apparent reason it does boot into emergency mode. Most unfortunately it does not store old boot records.

Is there a way to configure Rocky that if it boots into emergency mode that after 10 minutes or so it does a simple reboot? Because if i trigger the reboot over the linode web console it boots fine.

Also is there a way to keep the old boot logs? Since the game services are triggered by systemd it maybe deletes old logs because they have a very verbose output. My journald.conf is

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it under the
#  terms of the GNU Lesser General Public License as published by the Free
#  Software Foundation; either version 2.1 of the License, or (at your option)
#  any later version.
#
# Entries in this file show the compile time defaults. Local configuration
# should be created by either modifying this file, or by creating "drop-ins" in
# the journald.conf.d/ subdirectory. The latter is generally recommended.
# Defaults can be restored by simply deleting this file and all drop-ins.
#
# Use 'systemd-analyze cat-config systemd/journald.conf' to display the full config.
#
# See journald.conf(5) for details.

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=no
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
#ReadKMsg=yes
Audit=

iwalker · November 28, 2024, 9:04am

How have you configured it to apply updates at 5am in the morning? Some cronjob or something else?

Do you have scripts that are checking if it needs a reboot or not? If so, please can you post the content of any scripts you are using that does the update as well as the reboot process.

I have a linode account so I can check/test with more information provided by you, so can see if their (Linode’s) Rocky image is the problem or not. Also, what are the specs of the VM? CPU/ram?

frankie · November 28, 2024, 10:20am

The linode does have 4 CPUs / 8 GB RAM & 160 GB Storage. The update script is trigged by a cron job. the reboot function is:

function SF_SERVER_UPDATE() {
   SF_ROOT_REQUIRED
   echo "◉ systemupdate started."

   # start backup
   echo "◉ starting backup."
   SF_SERVER_BACKUP

   # prune backup
   echo "◉ pruning old backups."
   SF_BACKUP_PRUNE

   # checking dnf for updates
   echo "◉ checking dnf for updates."
   dnf update -y

   # check if reboot is required
   echo "◉ checking if reboot is required."
   REBOOT_REQUIRED=$(
      needs-restarting -r >/dev/null
      echo $?
   )

   if [ "$REBOOT_REQUIRED" -ge 1 ]; then
      echo "◉ reboot is required. sleeping for $SV_SLEEP before rebooting."
      sleep $SV_SLEEP # possible fix for reboot error, DO NOT REMOVE
      systemctl reboot now
   fi

   # restart services if necessary
   echo "◉ restarting services."
   SV_SERVICES_RESTARTING="$(needs-restarting -s)"
   SV_SERVICES_RESTARTING_EXITCODE="$?"
   if [ "$SV_SERVICES_RESTARTING_EXITCODE" -ge 1 ]; then
      echo "◉ restarting updated packages."
      systemctl restart "$SERVICES_RESTARTING"
   fi
}

iwalker · November 28, 2024, 10:44am

needs-restarting comes from yum-utils package. RHEL8/Rocky8 use dnf by default now, so the correct command to use in your script would be:

dnf needs-restarting --reboothint

That’s probably not the reason for the reboot, but it would be prudent to use dnf-related commands than rely on yum that is effectively deprecated. yum symlinks to dnf now anyway.

For the reboot, I’d tend to use:

shutdown -r now

rather than the systemd one.

I’ll make a VM and check test if I can replicate your problem though.

iwalker · November 28, 2024, 11:07am

So far I’ve not managed to replicate your problem. I do get this:

but after about 30 seconds or so the server reboots fine anyway. However, using the systemctl command that you use, does give me this:

[root@linode ~]# systemctl reboot now
Positional argument to reboot command is deprecated, please use --reboot-argument= instead. Accepting anyway.

it may be prudent, to either change the above to the correct parameter, or use the command I suggested: shutdown -r now instead. Perhaps this is the reason for instability. I used it, and it rebooted fine, but I’m just wondering if this is causing it to reboot into emergency mode.

iwalker · November 28, 2024, 11:11am

One additional note from something that I did on my VM, is disable crashkernel from using memory by default. So, first by editing /etc/kdump.conf and ensuring the line below:

auto_reset_crashkernel no

then disable kdump and mask it:

systemctl stop kdump
systemctl disable kdump
systemctl mask kdump

and then update the grub config not to use crashkernel:

grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg

frankie · November 28, 2024, 11:22am

thanks. i wasnt aware that dnf is able to do the needs-restarting thing itself. i was always wondering why i only found the yum needs-restarting package and no dnf solution but figured it must have been because the posts i copied it from where older and there is no new solution to do this.

i also should mention. this bug only happens like once every 3 months. i have not yet found a pattern to replicate this. is there a way for journalctl to keep old boot records?

iwalker · November 28, 2024, 11:52am

Yeah you can do both commands, using -r or -s like you did with the previous one, or the full parameter being --reboothint or --services.

Usually if logrotate is installed, then there shouldn’t be an issue with logs disappearing. I have a problem with OVH and a server with Rocky 9 that it doesn’t always shutdown properly when rebooting, and I constantly have to go on the console and force power off. I’ve put it down to OVH’s dedicated hardware or their Rocky image being crap. The other dedicated server I have which is same spec, but different datacenter works fine for rebooting every time.

I would be tempted though to change the systemctl reboot command to something else just in case.

frankie · November 28, 2024, 12:33pm

thank i added the dnf needs-restarting to my scrtipt for both checks. also i use shutdown -r now, it was reboot now before but i changed it because sometimes linode notifies me that “lassie the linode watchdog” started my server. even though i used reboot and not shutdown.

frankie · December 22, 2024, 10:27pm

It happend again but this time i was able to copy the /run/initramfs/rdsosreport.txt file. i hope this helps and somebody can figure out whats the problem

gist.github.com

https://gist.github.com/frankieXCIII/ab73ea4583d9fd1e0b44355c3f524e40

rdsosreport.txt

+ cat /lib/dracut/dracut-057-70.git20240819.el9
dracut-057-70.git20240819.el9
+ echo /proc/cmdline
/proc/cmdline
+ sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g' /proc/cmdline
BOOT_IMAGE=(hd0)/boot/vmlinuz-5.14.0-503.19.1.el9_5.x86_64 root=/dev/sda ro console=ttyS0,19200n8 net.ifnames=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
+ '[' -f /etc/cmdline ']'
+ for _i in /etc/cmdline.d/*.conf
+ '[' -f '/etc/cmdline.d/*.conf' ']'
+ break

This file has been truncated. show original

hs303 · December 23, 2024, 11:02am

To me it seems that grub tries to mount your swap partition instead of root under /sysroot while in initrd.

[ 2.447853] rocky-eu-central systemd[1]: Starting File System Check on /dev/sda…
[ 2.454151] rocky-eu-central systemd-fsck[425]: sda: fsck.swap doesn’t exist, not checking file system.
[ 2.454836] rocky-eu-central systemd[1]: Finished File System Check on /dev/sda.
[ 2.467324] rocky-eu-central systemd[1]: Mounting /sysroot…
[ 2.486338] rocky-eu-central mount[426]: mount: /sysroot: unknown filesystem type ‘swap’.
[ 2.486679] rocky-eu-central systemd[1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
[ 2.486725] rocky-eu-central systemd[1]: sysroot.mount: Failed with result ‘exit-code’.
[ 2.486926] rocky-eu-central systemd[1]: Failed to mount /sysroot.
[ 2.489357] rocky-eu-central systemd[1]: Dependency failed for Initrd Root File System.
[ 2.491448] rocky-eu-central systemd[1]: Dependency failed for Mountpoints Configured in the Real Root.

So check what does grubby say to the root location with sudo grubby --info=DEFAULT.
It probably uses /dev/sda for root. So the system seems to be changing the disk order from time to time and understands your swap as sda.
Find out the UUID of the real root with blkid and refer to the root with your UUID in the grub configuration:

sudo grubby --update-kernel=DEFAULT --args="root=UUID=123e4567-e89b-12d3-a456-426614174000"

Reboot the system to check if it boots properly.

frankie · December 24, 2024, 1:23pm

thanks for letting me know. there must some kind of boot device confusion. when i enter sudo grubby --info=DEFAULT it outputs:

index=0
kernel="/boot/vmlinuz-5.14.0-503.19.1.el9_5.x86_64"
args="ro console=ttyS0,19200n8 net.ifnames=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M"
root="/dev/sda"
initrd="/boot/initramfs-5.14.0-503.19.1.el9_5.x86_64.img"
title="Rocky Linux (5.14.0-503.19.1.el9_5.x86_64) 9.5 (Blue Onyx)"
id="c524ffd7d2694e56b74892531e19e67a-5.14.0-503.19.1.el9_5.x86_64"

lsblk -fm returns:

NAME FSTYPE FSVER LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINTS   SIZE OWNER GROUP MODE
sda  ext4   1.0   linode-root 8e842459-e9a4-cbf9-29e7-1f5a91116e81   57.1G    58% /           159.5G root  disk  brw-rw----
sdb  swap   1                 7b44a66c-c83a-48f0-8540-0bf253149134                [SWAP]        512M root  disk  brw-rw----

so i guess the command

sudo grubby --update-kernel=DEFAULT --args="root=UUID=123e4567-e89b-12d3-a456-426614174000"

would break my vm since i cant see the UUID 123e4567-e89b-12d3-a456-426614174000 anywhere

hs303 · December 24, 2024, 1:43pm

Yes, the uuid I entered was meant as a placeholder. You should take the uuid of your root instead, 8e84…

frankie · December 24, 2024, 5:10pm

thanks. should have read your post entirely before i replied. i applied the change. i guess this sloves the problem. thanks everybody.

i will update the post in a few months if it helped

hs303 · December 25, 2024, 8:27pm

I am not sure now if this command will make sure the same root path will be applied to the new kernel after an update. Maybe it will be necessary to set something in /etc/defaults/grub for grub to always take the uuid while generating a new entry for a new kernel. You can do your research or just check the grubby entry after a kernel update.

frankie · December 26, 2024, 6:20pm

i keep a eye on it. i ve nver seen a system that dosnt use uuid for booting before. its a bit odd tbh. curious myself how that happend

jbkt23 · December 27, 2024, 1:43pm

The grubby command will populate /etc/kernel/cmdline with the updated root path. This file is read, /etc/kernel/cmdline, on new kernel installs to populate the kernel parameters. Check and if the root path is there then you are good. There is no need to add the path to /etc/default/grub.

frankie · December 27, 2024, 7:13pm

well

cat: /etc/kernel/cmdline: No such file or directory

and if i ls it

/etc/kernel/
└── install.d

1 directory, 0 files

if i:
cat /etc/default/grub

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=0
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=ttyS0,19200n8 net.ifnames=0 scsi_mod.scan=sync rhgb crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_TERMINAL=serial
GRUB_DISABLE_OS_PROBER=true
GRUB_SERIAL_COMMAND="serial --speed=19200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_DISABLE_UUID=true
GRUB_GFXPAYLOAD_LINUX=text

jbkt23 · December 27, 2024, 7:42pm

My mistake, to create and populate /etc/kernel/cmdline file the format of the grubby command should be

sudo grubby --update-kernel=ALL --args="your args"

otherwise grubby thinks this is a one time application. The “ALL” keyword is intrepted as all current and all future kernels thus invokes the creation of the “cmdline” file.

frankie · December 27, 2024, 7:58pm

done. now cat /etc/kernel/cmdline outputs:

ro console=ttyS0,19200n8 net.ifnames=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M root=UUID=8e842459-e9a4-cbf9-29e7-1f5a91116e81

Topic		Replies	Views
When I reboot rocky 9.4 I find the following error Rocky Linux Help & Support	3	148	September 27, 2024
Rocky linux 9.5 after kernel upgrade going into emergency mode Rocky Linux Help & Support rocky-linux-9	5	454	April 24, 2025
Reboot leaves machine in emergency state - many start-limit-hit Rocky Linux Help & Support	37	3915	August 25, 2023
I am stuck in emergency mode ! Help! Rocky Linux Help & Support	1	43	January 2, 2025
Rocky 9 kernel-5.14.0-70.17.1.el9_0.x86_64 System falls to boot Rocky Linux Help & Support	5	1666	August 25, 2023

Rocky Linux 9.5 sometimes boot into emergency mode for no reason

Related topics