Reboot leaves machine in emergency state - many start-limit-hit

iwalker · April 15, 2022, 4:57pm

You’re going to have to provide a lot more information from your logs to be able to figure that problem out. Info from just a single systemd service failure isn’t enough (I edited your post to format it properly and make it easier to read - especially when posting stuff from console commands).

Check all your log files, observe the boot process to see what is failing during boot and see what errors are there. Usually before you end up getting into maintenance mode pressing CTRL-D there should be some hint at what is going wrong.

iwalker · April 15, 2022, 5:12pm

Well I just simulated a power failure by forcing my Rocky server off, and I powered it back on and it came up perfectly fine. Why yours experienced problems, impossible to say now that you reinstalled, without seeing the boot process or errors on screen. I’ve had file system errors before after a power failure and running fsck would fix all those anyhow and get you running again, but it still requires intervention and this is totally normal sometimes after power failure and should be expected. Sometimes you are lucky and it will boot normally, sometimes not.

hartings · April 16, 2022, 12:50pm

Thanks for your reply iwalker!

Forget about the power-loss issue, which originally triggered this thread.
This is NOT the problem at is shows (s).
Let me summarize the problem:

EVERY “systemctl reboot” boots the server in the emergency state!!
a controlled “shutdown -h 0” followed by a power off and on the power supply on the server (I have “power restore” after “power loss” on), SOMETIMES boots up correctly WITHOUT any errors, but about 1 out of 3 times it boots in the emergency state…
I suspected an EFI/BIOS issue, so I updated the EFI/BIOS to the latest version and made sure only a normal boot is allowed (no “fast boot”). ´All “performance tweak options” are off. I use standard options (expect power on after power outage).
when I have changed a setting in the BIOS it ALWAYS boots up correctly afterwards.
I find the behaviour totally unreliable, which I cannot except for my server which should run 24/7/365 and at least try to reboot after a power outage.
I played with many EFI/BIOS settings, but I cannot find anything there which repeatedly boots up correctly.
When there is a boot problem, it can be many different failures (in red), and there is no one that always is the first (red line). This shown by “journalctl -xb”:
systemd-hwdb-update.service
systemd-binfmt.service
iscsi-onboot.service
ostree-remount.service
systemd-sysusers.service
…
Checking mdadm --detail “on all my disks” shows all RAID disks are “CLEAN”.

I am very great full for any hints on where to start. My first experience with Rocky Linux is unfortunately not as good as with CentOS which I used for more than 10 years
I simply can’t understand why a simple reboot should lead to an emergency state after normal operation.
/Ralf

iwalker · April 16, 2022, 1:19pm

Can you try something slightly different. I never use systemctl to reboot a system, can you as the root user just do:

reboot

and press enter. Does it reboot cleanly?

Also, for your shutdown, can you try as root:

shutdown now

and then power up your system, does it start?

You say changing bios settings makes your system boot properly? Can you check if there are any bios updates for your computer? Seems strange that changing a bios setting triggers it to boot normally. But if so that could hint at a problem with your BIOS more than a problem with Rocky. Seems unstable to me. I’ve only had bios issues when AMIBIOS had the use optimal settings option everything was fine. But then if I tried to match the memory speed in the bios to what was on the RAM itself, only then did it get unstable. Admittedly the different between the bios settings chosen by optimal versus what I wanted me memory to actually run at was very small differences, so wouldn’t have had too much effect on slowing my machine down anyway.

hartings · April 16, 2022, 2:45pm

I have tried “reboot” which gives same result (emergency state) and “shutdown now”, which is afaik the same as “shutdown -h 0 (= halt system in 0 sec)”. Both with the same result.
ONLY when I, after a complete and clean shutdown, switch off the power supply unit (switch on back of computer), there is a chance it will boot properly. But not in all cases…
This is what makes me frustrated and suspecting UEFI/BIOS issues.
As I wrote in the previous post, I therefore updated the UEFI/BIOS on the server to the latest version from January 2022. Unfortunately, this doesn’t make any difference. When initiating a “reboot” (using similar reboot commands), the server ALWAYS ends up in the emergency state.
My server has a new ASUS Prime H-570-Plus motherboard.
I don’t think that memory speed settings can have an influence, as the server can run without any problems after a specific start-up procedure (clean shutdown + power-unit switch off/on).
I will search more on any known issues with my motherboard.
I have also tried this hint to avoid the problem with the start-up, related to ‘start-limit-hit’:
https://www.suse.com/support/kb/doc/?id=000019750
It did not make any difference.
Thanks again for your feedback! Any help is appreciated!!

gerry666uk · April 16, 2022, 3:55pm

The original post doesn’t say if the server was rebooting perfectly before you pulled the power cord (and damaged it). The formatting of the error messages in the original post is terrible, but there is mention of a “degraded” array. Have you tried commenting out all “non o/s” entries in /etc/fstab and rebooting it?

hartings · April 16, 2022, 4:21pm

Hi gerry666uk.
Sorry for the formatting… I’ll try to do better this time.
Yes, the orginal post didn’t mention the booting issues, as I didn’t notice them (at that time).
Yes, it says “degraded”, but when I check

# mdadm --detail /dev/mdXXX , XXX= the number

there are no issues. All arrays are “CLEAN”! Is there perhaps a problem with not correctly closing the system - whch takes a very long time? And then during the starting up they are initially “degraded”, but very quickly repaired? Is there a timing issue?

I have no “non o/s” entries, see below.

UUID=a5d17030-02ad-4ca5-beea-4ad98fb84b06 /                       xfs     defaults        0 0
UUID=22631df5-1226-43c1-b260-d5b262cc4cdd /boot                   xfs     defaults        0 0
UUID=CA1D-CBFB          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=dc4dfbf8-d660-41f9-be4b-784ad7c2c785 /home                   xfs     defaults        0 0
UUID=08e15db2-43f5-4a5c-8947-3ca66b5ccd2a /mnt/video              xfs     defaults        0 0
UUID=5a6fde69-fdba-465f-bb8a-55d40606d698 /var/surgemail          xfs     defaults        0 0
UUID=261bb57a-ecb7-4e10-9b0d-5a1ed5486634 /var/www                xfs     defaults        0 0
UUID=6d661f21-c266-4695-9bda-a86d63564e88 none                    swap    defaults        0 0

The thing that seems to trigger to failure (when going the lines in “journalctl -xb” is this (for all services that fail):

The unit loadmodules.service has entered the 'failed' state with result 'start-limit-hit'.
apr 16 17:37:04 server.hartings.se systemd[1]: Failed to start Load legacy module configuration.

When I try to reset and restart the service, it I get:

● loadmodules.service - Load legacy module configuration
   Loaded: loaded (/usr/lib/systemd/system/loadmodules.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Sat 2022-04-16 17:45:15 CEST; 11s ago
           ├─ ConditionPathExists=|/etc/rc.modules was not met
           └─ ConditionDirectoryNotEmpty=|/etc/sysconfig/modules was not met

apr 16 17:37:04 server.hartings.se systemd[1]: loadmodules.service: Start request repeated too quickly.
apr 16 17:37:04 server.hartings.se systemd[1]: loadmodules.service: Failed with result 'start-limit-hit'.
apr 16 17:37:04 server.hartings.se systemd[1]: Failed to start Load legacy module configuration.

I cannot restart the failing services afterwards.
I checked and there is no path: /etc/rc.modules and the directory /etc/sysconfig/modules is indeed empty!

I have no clue how to continue. Any hints ,or direction where to start looking, are appreciated!
And I hope the formatting is better now

iwalker · April 16, 2022, 4:42pm

Try creating the /etc/rc.modules directory.

mkdir /etc/rc.modules

then:

systemctl reset-failed
systemctl restart loadmodules

I would also check /var/log/messages and see if any warning/errors in there. As for the array, you can also check:

cat /proc/mdstat

hartings · April 16, 2022, 4:55pm

Thanks again iwalker!!

I did what you suggested and I can now restart the load.modules.service!!
But how could the directory /etc/rc-modules dissappear??

[root@server ~]# mkdir /etc/rc.modules
[root@server ~]# systemctl reset-failed
[root@server ~]# systemctl restart loadmodules
[root@server ~]# systemctl status loadmodules
● loadmodules.service - Load legacy module configuration
   Loaded: loaded (/usr/lib/systemd/system/loadmodules.service; enabled; vendor>
   Active: active (exited) since Sat 2022-04-16 18:50:23 CEST; 11s ago
  Process: 11060 ExecStart=/usr/libexec/loadmodules (code=exited, status=0/SUCC>
 Main PID: 11060 (code=exited, status=0/SUCCESS)

apr 16 18:50:23 server.hartings.se systemd[1]: Starting Load legacy module conf>
apr 16 18:50:23 server.hartings.se systemd[1]: Started Load legacy module confi>
lines 1-8/8 (END)

Regarding the status of the arrays:

[root@server ~]# cat /proc/mdstat
Personalities : [raid1] 
md120 : active raid1 sdd3[1] sdc3[0]
      48826368 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md121 : active raid1 sdd2[1] sdc2[0]
      439451648 blocks super 1.2 [2/2] [UU]
      bitmap: 1/4 pages [4KB], 65536KB chunk

md122 : active raid1 sdd1[1] sdc1[0]
      488184832 blocks super 1.2 [2/2] [UU]
      bitmap: 0/4 pages [0KB], 65536KB chunk

md123 : active raid1 sdb4[1] sda4[0]
      52160 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md124 : active raid1 sdb2[1] sda2[0]
      157285376 blocks super 1.2 [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md125 : active raid1 sdb3[1] sda3[0]
      585728 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb1[1] sda1[0]
      319814656 blocks super 1.2 [2/2] [UU]
      bitmap: 1/3 pages [4KB], 65536KB chunk

md127 : active raid1 sdb5[1] sda5[0]
      10373120 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
[root@server ~]#

They seem just fine to me.

iwalker · April 16, 2022, 5:07pm

It probably isn’t all that important:

[root@rocky ~]# systemctl status loadmodules
● loadmodules.service - Load legacy module configuration
   Loaded: loaded (/usr/lib/systemd/system/loadmodules.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Sat 2022-04-16 19:06:12 CEST; 9s ago
           ├─ ConditionPathExists=|/etc/rc.modules was not met
           └─ ConditionDirectoryNotEmpty=|/etc/sysconfig/modules was not met

mine doesn’t have it either, but the service is for loading legacy modules (old), therefore if no legacy modules are not installed, then the fact the service doesn’t work isn’t that important. Rather something else is the cause of your problems.

I would check /var/log/messages for anything with warning/errors.

hartings · April 16, 2022, 6:52pm

[root@server ~]# less /var/log/messages | grep warning
Apr 13 21:05:51 server /usr/libexec/gdm-x-session[2971]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 13 21:06:42 server /usr/libexec/gdm-x-session[5292]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 14:30:23 server1 /usr/libexec/gdm-x-session[3163]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 14:30:33 server1 /usr/libexec/gdm-x-session[6612]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 15:53:32 server1 /usr/libexec/gdm-x-session[2986]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 15:54:06 server1 /usr/libexec/gdm-x-session[3956]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 16:07:33 server1 /usr/libexec/gdm-x-session[2988]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 16:08:09 server1 /usr/libexec/gdm-x-session[6924]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 14 16:26:46 server /usr/libexec/gdm-x-session[2996]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.

Any many more such lines, but this doesn’t seem to be the problem.
I tried to find a section which seems to be related to a boot, which I see several times in the messages file:

Apr 16 11:55:26 server kernel: pcieport 0000:00:1b.0: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Apr 16 11:55:29 server alsactl[1697]: alsa-lib parser.c:242:(error_node) UCM is not supported for this HDA model (HDA Intel PCH at 0x6001110000 irq 131)
Apr 16 11:55:29 server alsactl[1697]: alsa-lib main.c:1405:(snd_use_case_mgr_open) error: failed to import hw:0 use case configuration -6
Apr 16 11:55:30 server vncserver-x11[1993,root]: HostedRendezvous: Rendezvous lookup failed: Hosted Bootstrap error: Network failure: Error connecting: getaddrinfo: 'Name or service not known' (-2)
Apr 16 11:55:36 server /usr/libexec/gdm-x-session[2966]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 16 11:55:37 server /usr/libexec/gdm-x-session[2966]: (EE) AIGLX error: Calling driver entry point failed
Apr 16 11:58:32 server kernel: pcieport 0000:00:1b.0: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Apr 16 11:58:35 server alsactl[1681]: alsa-lib parser.c:242:(error_node) UCM is not supported for this HDA model (HDA Intel PCH at 0x6001110000 irq 131)
Apr 16 11:58:35 server alsactl[1681]: alsa-lib main.c:1405:(snd_use_case_mgr_open) error: failed to import hw:0 use case configuration -6
Apr 16 11:58:36 server vncserver-x11[1991,root]: HostedRendezvous: Rendezvous lookup failed: Hosted Bootstrap error: Network failure: Error connecting: getaddrinfo: 'Name or service not known' (-2)
Apr 16 11:58:42 server /usr/libexec/gdm-x-session[2971]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 16 11:58:43 server /usr/libexec/gdm-x-session[2971]: (EE) AIGLX error: Calling driver entry point failed
Apr 16 11:59:46 server /usr/libexec/gdm-x-session[5334]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Apr 16 11:59:47 server /usr/libexec/gdm-x-session[5334]: (EE) AIGLX error: Calling driver entry point failed
Apr 16 11:59:57 server cupsd[1983]: REQUEST localhost - - "POST / HTTP/1.1" 200 151 Cancel-Subscription client-error-not-found
Apr 16 11:59:57 server journal[3996]: gnome-shell: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.
Apr 16 11:59:57 server journal[4101]: gsd-color: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.
Apr 16 11:59:57 server journal[4109]: gsd-media-keys: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.
Apr 16 11:59:57 server journal[4112]: gsd-power: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.
Apr 16 11:59:57 server journal[4145]: gsd-wacom: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.

Is there any significant info here? I can’t see it…

I tried to reset and restart a couple of the failed services during the booting process, but I cannot get them to work afterwards…, for example:

[root@server ~]# systemctl status systemd-hwdb-update.service
● systemd-hwdb-update.service - Rebuild Hardware Database
   Loaded: loaded (/usr/lib/systemd/system/systemd-hwdb-update.service; static; vendor preset: disabled)
   Active: inactive (dead)
Condition: start condition failed at Sat 2022-04-16 18:34:34 CEST; 1h 56min ago
           ├─ ConditionNeedsUpdate=/etc was not met
           └─ ConditionDirectoryNotEmpty=|/etc/udev/hwdb.d was not met
     Docs: man:hwdb(7)
           man:systemd-hwdb(8)

apr 16 18:33:17 server.hartings.se systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly.
apr 16 18:33:17 server.hartings.se systemd[1]: systemd-hwdb-update.service: Failed with result 'start-limit-hit'.
apr 16 18:33:17 server.hartings.se systemd[1]: Failed to start Rebuild Hardware Database.
[root@server ~]# systemctl reset-failed systemd-hwdb-update.service
[root@server ~]# systemctl start systemd-hwdb-update.service
[root@server ~]# systemctl status systemd-hwdb-update.service
● systemd-hwdb-update.service - Rebuild Hardware Database
   Loaded: loaded (/usr/lib/systemd/system/systemd-hwdb-update.service; static; vendor preset: disabled)
   Active: inactive (dead)
Condition: start condition failed at Sat 2022-04-16 20:31:29 CEST; 10s ago
           ├─ ConditionNeedsUpdate=/etc was not met
           └─ ConditionDirectoryNotEmpty=|/etc/udev/hwdb.d was not met
     Docs: man:hwdb(7)
           man:systemd-hwdb(8)

apr 16 18:33:17 server.hartings.se systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly.
apr 16 18:33:17 server.hartings.se systemd[1]: systemd-hwdb-update.service: Failed with result 'start-limit-hit'.
apr 16 18:33:17 server.hartings.se systemd[1]: Failed to start Rebuild Hardware Database.
[root@server ~]#

I really don’t know what to do anymore…
I did a controlled shutdow, power-off and restarted the server and it did start properly - this time…

I think I need some more hints and some rest now to get my mind straight again.

Wow, I just see that the /var/log/messages file contains info on my old server, called “server1”, see the start of this last post. I have no idea where that comes from… Have I mixed upp som config files with the name of my old server (I did copy some config files from my old server) ? Which would this likely to be? Could this be a reason for my problems?

FrankCox · April 16, 2022, 11:45pm

At this point, with no good way to know what got buggered when you crash stopped the machine and with what is now obviously unstable system, I think the time has come to reformat it and start over again with a fresh installation.

And please don’t yank the power cable like that again.

iwalker · April 17, 2022, 7:05am

Yes it could be, especially if your previous server was not Rocky. It is best not to replace config files with ones from older systems. The best method is to use diff on them, or remember what changes you need to make and apply them to the config files. For example, let’s assume you have httpd.conf from your old server and httpd.conf from your new server. And we copy the httpd.conf to httpd-old.conf on the new install, and then run diff:

diff httpd-old.conf httpd.conf

you can then see what changes are made. It’s not foolproof since other changes in the file such as comments added/amended will also show up but it can give you a good idea.

As the previous post by Frank, I also suggest completely reinstalling a clean system and apply your configuration changes to the new files rather than copy them over.

hartings · April 18, 2022, 4:40pm

Thanks all for your comments and help! I have now reinstalled Rocky Linux from scratch… twice.
After the fresh install (completely wiped out everything on the disks), I only did a “dnf update”. Nothing else.
So I stopped testing and did a reset of the motherboard CMOS (removing battery and shortcircuit the 2 pins on the motherboard. I did a Memtest 86 on the RAM (2x 8GB) and they passed all four runs without any error reported. The first reboot went fine. Then I did a second reboot. I end up with a screen saying:

GPT header corruption has been detected. Please check SATA mode setting in BIOS Setup, or you can us [Boot Sector (MBR/GPT) Recovery Policy] item under [Boot Configuration] page to recover GPT header.
Press F1 to Run SETUP

I do so and change in the BIOS:

[Boot Sector (MBR/GPT) Recovery Policy]  >> auto-recover mode

Third reboot: Same problem again: I end up in the emergency state. While there, I checked the disks with

# fsck /dev/sda
command: v
No errors detected
Header version: 1.0
Using 5 out of 128 partitions.
Xxxxx

Same for /dev/sdb (in Raid1 with sda). Disks seem OK.
mdadm --detail shows all arrays are clean.
I put SELinux in “disabled” mode, just to make sure this isn’t part of the problems. Reboot again >> same problem.

From /var/log/messages I read:

Apr 18 12:38:23 server kernel: pcieport 0000:00:1b.0: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Apr 18 12:39:07 server journal[3682]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server journal[3676]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server journal[3683]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server journal[3681]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.

More checking of logs:

[root@server ~]# journalctl -xb | grep warning
[root@server ~]# journalctl -xb | grep failure
Apr 18 12:38:26 server.hartings.se augenrules[1670]: failure 1
Apr 18 12:38:26 server.hartings.se augenrules[1670]: failure 1
Apr 18 12:38:26 server.hartings.se augenrules[1670]: failure 1

[root@server ~]# journalctl -xb | grep error
Apr 18 12:38:23 server.hartings.se kernel: pcieport 0000:00:1b.0: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Apr 18 12:39:07 server.hartings.se gsd-media-keys[3682]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server.hartings.se gsd-color[3676]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server.hartings.se gsd-power[3683]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
Apr 18 12:39:07 server.hartings.se gsd-keyboard[3681]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :1024.
[root@server ~]#

[root@server ~]# journalctl -xb | grep failed
Apr 18 12:38:24 server.hartings.se systemd-vconsole-setup[975]: KD_FONT_OP_GET failed while trying to get the font metadata: Invalid argument
Apr 18 12:38:25 server.hartings.se systemd-vconsole-setup[1030]: KD_FONT_OP_GET failed while trying to get the font metadata: Invalid argument
-- Subject: Unit failed
-- The unit ostree-remount.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit ostree-remount.service has failed
-- Unit ostree-remount.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit systemd-firstboot.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit systemd-firstboot.service has failed
-- Unit systemd-firstboot.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit systemd-sysusers.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit systemd-sysusers.service has failed
-- Unit systemd-sysusers.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit iscsi-onboot.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit iscsi-onboot.service has failed
-- Unit iscsi-onboot.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit loadmodules.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit loadmodules.service has failed
-- Unit loadmodules.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit systemd-binfmt.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit systemd-binfmt.service has failed
-- Unit systemd-binfmt.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit systemd-hwdb-update.service has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit systemd-hwdb-update.service has failed
-- Unit systemd-hwdb-update.service has failed.
-- The result is failed.
-- Subject: Unit failed
-- The unit systemd-ask-password-console.path has entered the 'failed' state with result 'start-limit-hit'.
-- Subject: Unit systemd-ask-password-console.path has failed
-- Unit systemd-ask-password-console.path has failed.
-- The result is failed.
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:41 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:42 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:42 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:42 server.hartings.se /usr/libexec/gdm-wayland-session[3217]: dbus-daemon[3229]: [session uid=42 pid=3229] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Apr 18 12:38:46 server.hartings.se dbus-daemon[2086]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
Apr 18 12:38:58 server.hartings.se gnome-shell[4015]: g_strsplit: assertion 'string != NULL' failed
Apr 18 12:38:59 server.hartings.se gsd-color[3676]: failed to set screen _ICC_PROFILE: Failed to open file “/home/alla/.local/share/icc/edid-019a5d51594ffac2786d2198016790c2.icc”: Permission denied
Apr 18 12:38:59 server.hartings.se gsd-color[3676]: failed to set screen _ICC_PROFILE: Failed to open file “/home/alla/.local/share/icc/edid-019a5d51594ffac2786d2198016790c2.icc”: Permission denied
Apr 18 12:39:01 server.hartings.se gnome-software[4488]: failed to get featured apps: no apps to show
Apr 18 12:39:03 server.hartings.se gnome-shell[4015]: clutter_paint_volume_set_width: assertion 'width >= 0.0f' failed
Apr 18 12:39:03 server.hartings.se gnome-shell[4015]: clutter_paint_volume_set_height: assertion 'height >= 0.0f' failed
Apr 18 12:39:07 server.hartings.se gdm-launch-environment][3129]: GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Apr 18 12:39:07 server.hartings.se gsd-color[4315]: failed to connect to device: Failed to connect to missing device /org/freedesktop/ColorManager/devices/xrandr_Philips_Consumer_Electronics_Company_16843009_gdm_42
Apr 18 12:39:22 server.hartings.se pulseaudio[3909]: GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
[root@server ~]#

So what is (are) the reason(s) for all of these problems? A gdm or gsd issue? or a hardware or what can it be?

With my limited knowledge, the only explanation I can see for these ramdom failures is a motherboard issue, as I don’t find any errors in DRAM nor the disks and I have the latest BIOS with fail-safe settings.

Can you see anything else in these logs with your experience?
All input is welcome!! THX.

FrankCox · April 18, 2022, 11:26pm

With general weirdness like this I always suspect hard drives or controllers since random crap coming in from the hard drive can cause all sorts of issues.

The first thing I would do is try to reseat the controller board(s) and the hard drive connectors.

The second thing I would do is try a different controller board.

The third thing I would do is start swapping the hard drives out.

Then the motherboard.

At the point the problem goes away, the last thing you changed was causing the problem.

ioplex · April 18, 2022, 11:44pm

That looks like a hardware issue. Seeing issues after testing pulling the power cord may have been just misleading coincidence.

hartings · April 19, 2022, 5:51am

Thanks Frank and ioplex.

I drew the same conclusions and I contacted the shop were I bought all parts and required a swap of the motherboard.

I will report back here when I have reinstalled everything again.

Thanks again all who responded!

Great help in this forum!

Ralf
Den 19 apr. 2022, kI 01:54, ioplex via Rocky Linux Forum <notifications@rockylinux.discoursemail.com> skrev: