Upgrade to 8.7 breaks mdmonitor service

After applying the latest series of updates (8.7) to my production dns/dhcp sever the 'mdmonitor 'service no longer starts after reboot. While this isn’t a show stopper by any means, it functioned without fault after the 8.6 update.

Of note:

  1. This server is configured with 2 disks in RAID 1 configuration.
  2. After upgrade/reboot I noticed that I have a new PID for mdmonitor.service.
  3. As I stated earlier, I’ve had no issues with this service in the past. But after considerable time spent Googling this particular issue I have come to realize that the mdmonitor.service can be troublesome.

Specifics:

[root@dns ~]# journalctl -p 3 -xb
-- Logs begin at Wed 2022-11-16 13:36:50 PST, end at Wed 2022-11-16 13:45:04 PST. --
Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: Failed to start Software RAID monitoring and management.
-- Subject: Unit mdmonitor.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit mdmonitor.service has failed.
-- 
-- The result is failed.

[root@dns ~]# systemctl status mdmonitor.service
● mdmonitor.service - Software RAID monitoring and management
   Loaded: loaded (/usr/lib/systemd/system/mdmonitor.service; enabled; vendor preset: enabled)
   Active: failed (Result: protocol) since Wed 2022-11-16 13:36:59 PST; 29min ago
  Process: 1214 ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid (code=exited, status=0/SUCCESS)

Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: Starting Software RAID monitoring and management...
Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: mdmonitor.service: New main PID 1216 does not exist or is a zombie.
Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: mdmonitor.service: Failed with result 'protocol'.
Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: Failed to start Software RAID monitoring and management.

[root@dns ~]# rpm -qa | grep mdadm
mdadm-4.2-5.el8.x86_64

[root@dns ~]# mdadm --detail --scan
ARRAY /dev/md/imsm metadata=imsm UUID=ac8f5588:4cd58724:8cf76fff:ebe585bc
ARRAY /dev/md/Boot container=/dev/md/imsm member=0 UUID=54ac28a1:587e186c:e0ac32ea:41b6b68e

[root@dns ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/Boot_0 UUID=54ac28a1:587e186c:e0ac32ea:41b6b68e
ARRAY /dev/md/imsm UUID=ac8f5588:4cd58724:8cf76fff:ebe585bc

[root@dns ~]# cat /usr/lib/systemd/system/mdmonitor.service
[Unit]
Description=Software RAID monitoring and management
After=syslog.target
ConditionPathExists=/etc/mdadm.conf

[Service]
Type=forking
PIDFile=/var/run/mdadm/mdadm.pid
EnvironmentFile=-/etc/sysconfig/mdmonitor
ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid

[Install]
WantedBy=multi-user.target

Please let me know if I can supply more data to help with a diagnosis.

And, as always… Thanks in advance!

I don’t understand this part?

When I checked the status of mdmonitor service this was part of the output:

Nov 16 13:36:59 dns.<yadayada>.com systemd[1]: mdmonitor.service: New main PID 1216 does not exist or is a zombie.

I assumed the upgrade created a new instance…

Yes, I noticed the pid being reported, possibly related to ‘forking’. Can you check if
‘/var/run/mdadm/mdadm.pid’ exists, or did exist, or tried to exist?

I have discovered a work around to get mdmonitor working for now. Since I’m working remotely from home I will post what I learned from troubleshooting ‘/var/run/mdadm/mdadm.pid’ issue tommorrow when I have more time.

BTW - Thanks for the ‘nudge’ to look into PID issue, gerry666uk!

I upgraded to 8.7 today and I have the same error.
@mdelfava
Can you post your work around?

I humbly apologize for the delay in posting my work around. We were out of town on for a few days visiting family over the Thanksgiving holidays. I will endeavor to be more prompt in the future…

My work around for this issue, per: https://bugzilla.redhat.com/show_bug.cgi?id=2023092

I made a back up of : /usr/lib/systemd/system/mdmonitor.service

I then deleted the following line from the mdmonitor.service file:

PIDFile=/var/run/mdadm/mdadm.pid

The mdmonitor service started manually. And I confirmed successful startup after reboot.

According to the bugzilla post above:

“This points to the issue being that there’s a race between systemd accessing the file set up in “PIDFile=” and mdadm deleting the pidfile after exiting…”

If anyone has further thoughts please let me know…

1 Like