Rocky Linux 9.5 sometimes boot into emergency mode for no reason

Hi

i have a Rocky 9.5 running on Linode that sometimes after installing updates boots into emergency mode. Its a game server thats internet facing and therefore is searching/installing system updates every day at 5 in the morning. A lot of times it does install updates and reboots after that (the script has a check reboot function) and everything is fine but sometimes for no apparent reason it does boot into emergency mode. Most unfortunately it does not store old boot records.

Is there a way to configure Rocky that if it boots into emergency mode that after 10 minutes or so it does a simple reboot? Because if i trigger the reboot over the linode web console it boots fine.

Also is there a way to keep the old boot logs? Since the game services are triggered by systemd it maybe deletes old logs because they have a very verbose output. My journald.conf is

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it under the
#  terms of the GNU Lesser General Public License as published by the Free
#  Software Foundation; either version 2.1 of the License, or (at your option)
#  any later version.
#
# Entries in this file show the compile time defaults. Local configuration
# should be created by either modifying this file, or by creating "drop-ins" in
# the journald.conf.d/ subdirectory. The latter is generally recommended.
# Defaults can be restored by simply deleting this file and all drop-ins.
#
# Use 'systemd-analyze cat-config systemd/journald.conf' to display the full config.
#
# See journald.conf(5) for details.

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=no
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
#ReadKMsg=yes
Audit=

How have you configured it to apply updates at 5am in the morning? Some cronjob or something else?

Do you have scripts that are checking if it needs a reboot or not? If so, please can you post the content of any scripts you are using that does the update as well as the reboot process.

I have a linode account so I can check/test with more information provided by you, so can see if their (Linode’s) Rocky image is the problem or not. Also, what are the specs of the VM? CPU/ram?

The linode does have 4 CPUs / 8 GB RAM & 160 GB Storage. The update script is trigged by a cron job. the reboot function is:

function SF_SERVER_UPDATE() {
   SF_ROOT_REQUIRED
   echo "◉ systemupdate started."

   # start backup
   echo "◉ starting backup."
   SF_SERVER_BACKUP

   # prune backup
   echo "◉ pruning old backups."
   SF_BACKUP_PRUNE

   # checking dnf for updates
   echo "◉ checking dnf for updates."
   dnf update -y

   # check if reboot is required
   echo "◉ checking if reboot is required."
   REBOOT_REQUIRED=$(
      needs-restarting -r >/dev/null
      echo $?
   )

   if [ "$REBOOT_REQUIRED" -ge 1 ]; then
      echo "◉ reboot is required. sleeping for $SV_SLEEP before rebooting."
      sleep $SV_SLEEP # possible fix for reboot error, DO NOT REMOVE
      systemctl reboot now
   fi

   # restart services if necessary
   echo "◉ restarting services."
   SV_SERVICES_RESTARTING="$(needs-restarting -s)"
   SV_SERVICES_RESTARTING_EXITCODE="$?"
   if [ "$SV_SERVICES_RESTARTING_EXITCODE" -ge 1 ]; then
      echo "◉ restarting updated packages."
      systemctl restart "$SERVICES_RESTARTING"
   fi
}

needs-restarting comes from yum-utils package. RHEL8/Rocky8 use dnf by default now, so the correct command to use in your script would be:

dnf needs-restarting --reboothint

That’s probably not the reason for the reboot, but it would be prudent to use dnf-related commands than rely on yum that is effectively deprecated. yum symlinks to dnf now anyway.

For the reboot, I’d tend to use:

shutdown -r now

rather than the systemd one.

I’ll make a VM and check test if I can replicate your problem though.

So far I’ve not managed to replicate your problem. I do get this:

but after about 30 seconds or so the server reboots fine anyway. However, using the systemctl command that you use, does give me this:

[root@linode ~]# systemctl reboot now
Positional argument to reboot command is deprecated, please use --reboot-argument= instead. Accepting anyway.

it may be prudent, to either change the above to the correct parameter, or use the command I suggested: shutdown -r now instead. Perhaps this is the reason for instability. I used it, and it rebooted fine, but I’m just wondering if this is causing it to reboot into emergency mode.

One additional note from something that I did on my VM, is disable crashkernel from using memory by default. So, first by editing /etc/kdump.conf and ensuring the line below:

auto_reset_crashkernel no

then disable kdump and mask it:

systemctl stop kdump
systemctl disable kdump
systemctl mask kdump

and then update the grub config not to use crashkernel:

grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg

thanks. i wasnt aware that dnf is able to do the needs-restarting thing itself. i was always wondering why i only found the yum needs-restarting package and no dnf solution but figured it must have been because the posts i copied it from where older and there is no new solution to do this.

i also should mention. this bug only happens like once every 3 months. i have not yet found a pattern to replicate this. is there a way for journalctl to keep old boot records?

Yeah you can do both commands, using -r or -s like you did with the previous one, or the full parameter being --reboothint or --services.

Usually if logrotate is installed, then there shouldn’t be an issue with logs disappearing. I have a problem with OVH and a server with Rocky 9 that it doesn’t always shutdown properly when rebooting, and I constantly have to go on the console and force power off. I’ve put it down to OVH’s dedicated hardware or their Rocky image being crap. The other dedicated server I have which is same spec, but different datacenter works fine for rebooting every time.

I would be tempted though to change the systemctl reboot command to something else just in case.

thank i added the dnf needs-restarting to my scrtipt for both checks. also i use shutdown -r now, it was reboot now before but i changed it because sometimes linode notifies me that “lassie the linode watchdog” started my server. even though i used reboot and not shutdown.