Rocky Linux 9.5 sometimes boot into emergency mode for no reason

Hi

i have a Rocky 9.5 running on Linode that sometimes after installing updates boots into emergency mode. Its a game server thats internet facing and therefore is searching/installing system updates every day at 5 in the morning. A lot of times it does install updates and reboots after that (the script has a check reboot function) and everything is fine but sometimes for no apparent reason it does boot into emergency mode. Most unfortunately it does not store old boot records.

Is there a way to configure Rocky that if it boots into emergency mode that after 10 minutes or so it does a simple reboot? Because if i trigger the reboot over the linode web console it boots fine.

Also is there a way to keep the old boot logs? Since the game services are triggered by systemd it maybe deletes old logs because they have a very verbose output. My journald.conf is

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it under the
#  terms of the GNU Lesser General Public License as published by the Free
#  Software Foundation; either version 2.1 of the License, or (at your option)
#  any later version.
#
# Entries in this file show the compile time defaults. Local configuration
# should be created by either modifying this file, or by creating "drop-ins" in
# the journald.conf.d/ subdirectory. The latter is generally recommended.
# Defaults can be restored by simply deleting this file and all drop-ins.
#
# Use 'systemd-analyze cat-config systemd/journald.conf' to display the full config.
#
# See journald.conf(5) for details.

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=no
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
#ReadKMsg=yes
Audit=

How have you configured it to apply updates at 5am in the morning? Some cronjob or something else?

Do you have scripts that are checking if it needs a reboot or not? If so, please can you post the content of any scripts you are using that does the update as well as the reboot process.

I have a linode account so I can check/test with more information provided by you, so can see if their (Linode’s) Rocky image is the problem or not. Also, what are the specs of the VM? CPU/ram?

The linode does have 4 CPUs / 8 GB RAM & 160 GB Storage. The update script is trigged by a cron job. the reboot function is:

function SF_SERVER_UPDATE() {
   SF_ROOT_REQUIRED
   echo "◉ systemupdate started."

   # start backup
   echo "◉ starting backup."
   SF_SERVER_BACKUP

   # prune backup
   echo "◉ pruning old backups."
   SF_BACKUP_PRUNE

   # checking dnf for updates
   echo "◉ checking dnf for updates."
   dnf update -y

   # check if reboot is required
   echo "◉ checking if reboot is required."
   REBOOT_REQUIRED=$(
      needs-restarting -r >/dev/null
      echo $?
   )

   if [ "$REBOOT_REQUIRED" -ge 1 ]; then
      echo "◉ reboot is required. sleeping for $SV_SLEEP before rebooting."
      sleep $SV_SLEEP # possible fix for reboot error, DO NOT REMOVE
      systemctl reboot now
   fi

   # restart services if necessary
   echo "◉ restarting services."
   SV_SERVICES_RESTARTING="$(needs-restarting -s)"
   SV_SERVICES_RESTARTING_EXITCODE="$?"
   if [ "$SV_SERVICES_RESTARTING_EXITCODE" -ge 1 ]; then
      echo "◉ restarting updated packages."
      systemctl restart "$SERVICES_RESTARTING"
   fi
}

needs-restarting comes from yum-utils package. RHEL8/Rocky8 use dnf by default now, so the correct command to use in your script would be:

dnf needs-restarting --reboothint

That’s probably not the reason for the reboot, but it would be prudent to use dnf-related commands than rely on yum that is effectively deprecated. yum symlinks to dnf now anyway.

For the reboot, I’d tend to use:

shutdown -r now

rather than the systemd one.

I’ll make a VM and check test if I can replicate your problem though.

So far I’ve not managed to replicate your problem. I do get this:

but after about 30 seconds or so the server reboots fine anyway. However, using the systemctl command that you use, does give me this:

[root@linode ~]# systemctl reboot now
Positional argument to reboot command is deprecated, please use --reboot-argument= instead. Accepting anyway.

it may be prudent, to either change the above to the correct parameter, or use the command I suggested: shutdown -r now instead. Perhaps this is the reason for instability. I used it, and it rebooted fine, but I’m just wondering if this is causing it to reboot into emergency mode.

One additional note from something that I did on my VM, is disable crashkernel from using memory by default. So, first by editing /etc/kdump.conf and ensuring the line below:

auto_reset_crashkernel no

then disable kdump and mask it:

systemctl stop kdump
systemctl disable kdump
systemctl mask kdump

and then update the grub config not to use crashkernel:

grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg

thanks. i wasnt aware that dnf is able to do the needs-restarting thing itself. i was always wondering why i only found the yum needs-restarting package and no dnf solution but figured it must have been because the posts i copied it from where older and there is no new solution to do this.

i also should mention. this bug only happens like once every 3 months. i have not yet found a pattern to replicate this. is there a way for journalctl to keep old boot records?

Yeah you can do both commands, using -r or -s like you did with the previous one, or the full parameter being --reboothint or --services.

Usually if logrotate is installed, then there shouldn’t be an issue with logs disappearing. I have a problem with OVH and a server with Rocky 9 that it doesn’t always shutdown properly when rebooting, and I constantly have to go on the console and force power off. I’ve put it down to OVH’s dedicated hardware or their Rocky image being crap. The other dedicated server I have which is same spec, but different datacenter works fine for rebooting every time.

I would be tempted though to change the systemctl reboot command to something else just in case.

thank i added the dnf needs-restarting to my scrtipt for both checks. also i use shutdown -r now, it was reboot now before but i changed it because sometimes linode notifies me that “lassie the linode watchdog” started my server. even though i used reboot and not shutdown.

It happend again but this time i was able to copy the /run/initramfs/rdsosreport.txt file. i hope this helps and somebody can figure out whats the problem

To me it seems that grub tries to mount your swap partition instead of root under /sysroot while in initrd.

[ 2.447853] rocky-eu-central systemd[1]: Starting File System Check on /dev/sda…
[ 2.454151] rocky-eu-central systemd-fsck[425]: sda: fsck.swap doesn’t exist, not checking file system.
[ 2.454836] rocky-eu-central systemd[1]: Finished File System Check on /dev/sda.
[ 2.467324] rocky-eu-central systemd[1]: Mounting /sysroot…
[ 2.486338] rocky-eu-central mount[426]: mount: /sysroot: unknown filesystem type ‘swap’.
[ 2.486679] rocky-eu-central systemd[1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
[ 2.486725] rocky-eu-central systemd[1]: sysroot.mount: Failed with result ‘exit-code’.
[ 2.486926] rocky-eu-central systemd[1]: Failed to mount /sysroot.
[ 2.489357] rocky-eu-central systemd[1]: Dependency failed for Initrd Root File System.
[ 2.491448] rocky-eu-central systemd[1]: Dependency failed for Mountpoints Configured in the Real Root.

So check what does grubby say to the root location with sudo grubby --info=DEFAULT.
It probably uses /dev/sda for root. So the system seems to be changing the disk order from time to time and understands your swap as sda.
Find out the UUID of the real root with blkid and refer to the root with your UUID in the grub configuration:

sudo grubby --update-kernel=DEFAULT --args="root=UUID=123e4567-e89b-12d3-a456-426614174000"

Reboot the system to check if it boots properly.

thanks for letting me know. there must some kind of boot device confusion. when i enter sudo grubby --info=DEFAULT it outputs:

index=0
kernel="/boot/vmlinuz-5.14.0-503.19.1.el9_5.x86_64"
args="ro console=ttyS0,19200n8 net.ifnames=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M"
root="/dev/sda"
initrd="/boot/initramfs-5.14.0-503.19.1.el9_5.x86_64.img"
title="Rocky Linux (5.14.0-503.19.1.el9_5.x86_64) 9.5 (Blue Onyx)"
id="c524ffd7d2694e56b74892531e19e67a-5.14.0-503.19.1.el9_5.x86_64"

lsblk -fm returns:

NAME FSTYPE FSVER LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINTS   SIZE OWNER GROUP MODE
sda  ext4   1.0   linode-root 8e842459-e9a4-cbf9-29e7-1f5a91116e81   57.1G    58% /           159.5G root  disk  brw-rw----
sdb  swap   1                 7b44a66c-c83a-48f0-8540-0bf253149134                [SWAP]        512M root  disk  brw-rw----

so i guess the command

sudo grubby --update-kernel=DEFAULT --args="root=UUID=123e4567-e89b-12d3-a456-426614174000"

would break my vm since i cant see the UUID 123e4567-e89b-12d3-a456-426614174000 anywhere

Yes, the uuid I entered was meant as a placeholder. You should take the uuid of your root instead, 8e84…

thanks. should have read your post entirely before i replied. i applied the change. i guess this sloves the problem. thanks everybody.

i will update the post in a few months if it helped

I am not sure now if this command will make sure the same root path will be applied to the new kernel after an update. Maybe it will be necessary to set something in /etc/defaults/grub for grub to always take the uuid while generating a new entry for a new kernel. You can do your research or just check the grubby entry after a kernel update.