[SOLVED!] The Further Adventures of a Vacuous Experimenter

OK it ONLY took several months to SOLVE the problem… and I’m not quite sure what I did to solve the problem!!

When I last left you I was knocking my head against a wall trying to figure out WHY Rocky Linux 8.5 would not install on my NVMe 4.0 drive but had no problems with installing on an ancient (technologically speaking) 2 TB HDD. Installing it on the NVMe drive would throw the following ERROR message: "The following error occurred while installing. This is a fatal error and installation will be aborted: DNF ERROR: Error unpacking rpm package [some rpm package]. "

Trying to copy openSUSE 15.3 which is on a 1 TB using the dd command ran into an “Out of space” error and trying to copy Rocky Linux on the 2TB (though 1 TB in Size) also also ran into the “Out of Space” error. Now I reached for the last trick up my sleeve and pulled out a copy of CLONEZILLA. openSUSE 15.3 FAILED to clone, (still telling me "Out of space). A perusal of GParted showed me that out of 8 partitions only the first 3 were copied. Trying to clone the 1TB on the 2TB drive gave the “Sorry you can not clone a larger drive to the smaller drive” error, so I decided to go for a Partition - to - Partition. Back to GParted I went and wiped out the mess there, This time Clonezilla protested because the NVMe drive was empty so I had to create exact copies of the various partitions. I used Rocky Linux 8.5 and tried a reinstall, and as expected it gave me the the dreaded, “The following error occurred…”. I went over to GParted and much to my amusement ALL THE PARTITIONS had been created – empty but created!! Clonezilla now was saying the partition sizes were WRONG. I began to think my problem was due to BAD BLOCKS. I decided to check this out. I came across the following commands:

smartctl -H /dev/nvme0n1

The report came back and said it was fine. Next I ran

badblocks -v /dev/nvme0n1

It said I had ZERO bad blocks. According to these two tests I had a good drive, as I would suspect as it is less than 2 years old. Now I suspected the problem was either with the motherboard or a loose connection. This required me to REMOVE THE NVMe drive and reseat the thing.

For most people this would not be a big thing but removing a teeny, tiny screw when one has Parkinson’s Disease is a sure recipe for disaster. I rooted around in my bin that holds floppy 3.5"s drives going all the back to DOS 3.5 - DOS 6.22, as well as jeweler’s screwdrivers and found one that has a magnetized tip that I had moved there from a larger collection of screwdrivers, a prayed it would hold the teeny tiny screw. It did!!. Next it was to remove the drive and decided to move it over the 2nd NVMe 4.0 slot but it required a taller screw so ended putting the drive back into the same slot I removed it from – worse came to worse I could root around and try to find high standoff and the longer screw. I reseated the drive to make sure it was firmly seated, and then it was time to reinstall the screw. I fitted it into my magnetized screw driver and then I fitted it into the hole, and tightened the screw down.

Having done this I then did a re-install of Rocky Linux 5.3 creating the EXACT SAME SIZE partitions so that I could do a partition - to - partition copy from the 2 TB drive to the 1 TB NVMe drive. I selected the “Server With GUI” option with no additional packages selected, and finished off the initial setup up routine, and began the INSTALL.

I fully expected it to reach the block 3627904 or there about (according to Clonezilla while trying to install openSUSE 15.3) and then blow up, instead it ran all the way through with the package installation. OK it was going to blow up during the configuration phase. NOPE! Indeed it is now up and running. While I still have another 1-2 weeks of installing other packages, configuration, tweaks, etc. Ocelot is now up and running – indeed I am posting from ocelot now!!

CONCLUSION:

So what went WRONG? NO IDEA!! The Motherboard is an ASUS Prime X570 Pro motherboard; an AMD Ryzen 9 5900X 12 core CPU; the NVMe is the Corsair Force MP600 PCie 4.0 1 TB drive. I suspected that one of these 3 things was the problem. While this is just a GUESS I now think it was both a combination of the motherboard NVMe slot being a tad to long, the screw head being a tad to small with the head just catching the lip of the drive holding it in place, rather than going through the center of the cutout. A screw with a slightly wider head would have been better. Combine that constant shock of picking the computer up, and opening up for maintenance, etc, etc., etc. may have jarred the drive just a bit so that it was not making a good connection. That still does not explain WHY CentOS 8.3 installed but Rocky Linux 8.4 did not. It also does not explain WHY I had no problems until I rolled the machine over from Rocky Linux 8.4 to 8.5 It could just be I had a tad bit of LUCK that THIS TIME the install went thru, but NEXT TIME??

With Rocky Linux 8.5 now installed on the 1 TB NVMe drive; openSUSE 15.3 Leap on the 1 TB SATA drive, after I finish pulling some files (mostly backgrounds and checking where certain configuration files are located etc.), from the corrupt Rocky Linux 8.5 backup, I get to blow that copy away and set up the 2 TB HDD to start receiving and storing BACKUP files, as was ORIGINALLY intended, but I never quite managed to get to that point as the 2 TB HDD has served as a scratch disk while trying to juggle first 4 then 3, and then the final 2 OS’s.

If your NVMe drives seem to suddenly get flaky it may be well to yank them and re-install them, especially if you can do a re-install on an alternate drive. It also pays to BACKUP the drive periodically just in case the next point release proves problematic, simply roll it back to the last release. That was “The Plan” here but I never got this rig to that point.

After several MONTHS, mark this SOLVED!!! (I Think!).

D’Cat