Rocky Server Crashed This morning

synack2 · July 9, 2022, 3:34pm

My personal server decided to crash this morning at about 5:30 AM. I was doing a dnf update and it crashed part way through saying it couldn’t write a file. The whole OS drive was offline and dead. It was a 120GB SSD that was several years old. This server has been upgraded since CentOS 7, to 8.2, 8.3, CentOS Stream, then to Rocky. So it was time for a good nuke and pave anyway. Luckily I keep copies of my smb.conf, iptables, main.cf, and crontab, on my backup drive, so restoring it back to operation was a breeze. The main raid survived unharmed and mounted right back up, so I didn’t have to rsync 8TB from my offsite backup server, which was nice!

berlin · July 10, 2022, 1:06pm

It is the damage of your SSD driver.

It’s good of having some backups.

By the way, if you have raid your SSD, such as mdadm raid1 of both your ‘/boot’, ‘/boot/efi’, and ‘/’ in two SSD, it will keep your system still running after damage of one SSD driver.

gerry666uk · July 10, 2022, 9:34pm

Yes, but it’s worth making sure you keep the logs from just before the “crash”, and note the exact error messages. If it’s to do with failed SSD there should be a huge number of block io messages. (unless the logs were on the same physical drive).

synack2 · July 12, 2022, 2:50pm

Yeah /var/log was on the boot drive as well. I knew it was bad by hooking it up to a USB adapter to my Windows Laptop and running some diagnostics on it. I do really good backups of my data. I have a nightly script that runs and copies all of my configuration files to /data on raid drives which has a luks encrypted volume, and I have another script that rsyncs that to external drive with luks encryption. I rotate the external drive out once per week to a safety deposit box.

The external drive came in handy about 5 years ago when the feds kicked down my door and took my family out in handcuffs at gun point, and took all my computer equipment. Back then I didn’t do encryption, but I do today. I was never charged (because I never did anything illegal), but I never got any of my equipment back. Luckily I had my backup drive in my safety deposit box so I could rebuild everything.

synack2 · July 12, 2022, 2:55pm

I have found software RAID1 on the boot partition, to be problematic at best, and it never works quite right. I would just rather do a single drive or if I had a hardware raid controller that will work, but my MB was built for windows and there there was never a Linux driver created for it’s built in raid. So… I just have a document with lots of notes on what I need to install and I copy all of my config files over, so I can be backup and running in a few hours, which is acceptable for a home server.

jlehtone · July 12, 2022, 4:50pm

That is great. Config management systems (e.g. Ansible, Chef, Puppet, Salt) are a machine actionable version of notes. Worth a look.

Topic		Replies	Views
OS Drive Crashed Off-Topic	1	285	August 25, 2023
Using Rocky To Create a Business Continuity Solution for Windows Server Customers Rocky Linux Help & Support	3	202	May 2, 2024
OS crashing and forcing logout Rocky Linux Help & Support	3	1242	August 25, 2023
SSSD core dumps on Rocky 9.2 intermittently Rocky Linux Help & Support	3	583	January 21, 2024
Rocky 8 server freezes after 3-10 hours, journalctl killed. Kernel/IO issue? Rocky Linux Help & Support	6	1995	July 9, 2023

Rocky Server Crashed This morning

Related topics