My old (core i5-750; that’s how old it is) not-a-server is running CentOS 6. I figured it time to upgrade, and since Rocky 8 has been working well in a few VMs running on that host I decided that would be the path to go.
Now the first issue I hit was my disk controllers are no longer supported by RH. Fortunately this is a common issue (since they’re effectively the same as some of the older Dell PERC controllers) and ELrepo mpt3sas driver still includes the mpt2sas stuff needed.
So far so good.
And the machine ran happily for 2 days.
And then froze solid. Nothing appeared on the console, nothing in the logs, it wouldn’t respond to any key presses. Basically solid-wedge.
Hard power cycle, everything came back up. 3 hours later another wedge. Repeat… and again 3 hours later, wedge.
The only thing I can think of that was different between the 2 days of reliability and the 3 wedges is that I was doing intensive I/O, rsync’ing level0 backups from a RAID-6 on the SAS2008 controllers to a RAID0 on two external USB disks.
The first hang happened 10 minutes after the cron-job started. After a reboot I restarted the rsync. And again on the third time.
After three hangs I reverted the system back to Centos 6 (I’d got new SSDs for Rocky 8 so it was just an SSD swap) and then the rsync completed without error (which it had been doing for the past 2+ years, and even longer on different external disks).
So my gut feeling is that the mpt2sas component of the mpt3sas driver in ELrepo isn’t quite stable and hangs during times of heavy I/O.
Has anyone else seen this?