Stability issues with SAS2008 (LSI 9211-8i) controllers?

sweh · November 29, 2021, 5:05pm

My old (core i5-750; that’s how old it is) not-a-server is running CentOS 6. I figured it time to upgrade, and since Rocky 8 has been working well in a few VMs running on that host I decided that would be the path to go.

Now the first issue I hit was my disk controllers are no longer supported by RH. Fortunately this is a common issue (since they’re effectively the same as some of the older Dell PERC controllers) and ELrepo mpt3sas driver still includes the mpt2sas stuff needed.

So far so good.

And the machine ran happily for 2 days.

And then froze solid. Nothing appeared on the console, nothing in the logs, it wouldn’t respond to any key presses. Basically solid-wedge.

Hard power cycle, everything came back up. 3 hours later another wedge. Repeat… and again 3 hours later, wedge.

The only thing I can think of that was different between the 2 days of reliability and the 3 wedges is that I was doing intensive I/O, rsync’ing level0 backups from a RAID-6 on the SAS2008 controllers to a RAID0 on two external USB disks.

The first hang happened 10 minutes after the cron-job started. After a reboot I restarted the rsync. And again on the third time.

After three hangs I reverted the system back to Centos 6 (I’d got new SSDs for Rocky 8 so it was just an SSD swap) and then the rsync completed without error (which it had been doing for the past 2+ years, and even longer on different external disks).

So my gut feeling is that the mpt2sas component of the mpt3sas driver in ELrepo isn’t quite stable and hangs during times of heavy I/O.

Has anyone else seen this?

toracat · November 30, 2021, 1:13am

You might want to test-install kernel-ml from ELRepo.

The mpt3sas module in el-8.5 is 37.101.00.00. That in the current kernel-ml (5.15.x) is 39.100.00.00. So it may be worth a try.

blank · December 4, 2021, 1:28am

How about trying to do high IO without the USB disks - just within the SAS2008?

sweh · December 4, 2021, 11:05pm

Fingers crossed… we’ll see how stable this is!

% uname -sr ; dmesg | grep mpt3sas | head -1
Linux 5.15.6-1.el8.elrepo.x86_64
[    1.868474] mpt3sas version 39.100.00.00 loaded

Thanks for the tip.

sweh · December 6, 2021, 3:32am

So far so good; it’s been up over a day and performed the weekly backup process without a hitch.

Now whether that’s the mpt3sas driver or some other change between RH-4.18 and mainline 5.15 kernels I can’t tell!

toracat · December 6, 2021, 6:09pm

That’s great news.

It could be due to the newer version of the mpt3sas driver, but you’re right, that’s hard to tell.

sweh · December 15, 2021, 4:38pm

A second busy weekend succeeded with no hangs. Yay!

Maybe related, maybe a coincidence…

On Saturday one of the drives in the RAID6 started to fail, and eventually dropped out of the array with unrecoverable errors. It may have been the power cycling while I was doing work may have stressed it ('cos it was 7.5 years old!).

But it makes me wonder if it might have been playing up earlier and triggered an issue in the older driver (causing a hang) that the newer one correctly handled.

Replacing the disk caused a lot of I/O (since it’s 8*4TB in RAID6); continuous 60MB/s rebuild speed (as reported by mdstat)for 18 hours, and no problems reported. So that’s also another sign of stability with this kernel.

Topic		Replies	Views
Mpt3sas does not work with RockyLinux 9 Rocky Linux Help & Support	25	8673	August 25, 2023
LSI SAS Controller Rocky Linux Help & Support	18	5569	August 25, 2023
DELL 6GBPS SAS HBA Device Driver for Rocky Rocky Linux Help & Support	24	2181	August 25, 2023
9.1 -> 9.2 , unable to load dual SAS2008 raid 6 volume Rocky Linux Help & Support	25	987	August 25, 2023
Rocky Broadcom SAS compatibility Rocky Linux Help & Support	7	1502	February 4, 2024

Stability issues with SAS2008 (LSI 9211-8i) controllers?

Related topics