How do I diagnose RL v8.6 boot issue?

The old (hardware) system that I use for a local fileserver stopped booting after I attempted to install support for ntsf drives.

The hardware has a SATA controller with 4 plugin bays. It has been happily running for months with an xfs filesystem mounted as /mnt/internal_hd0. The physical volume is mounted in the top slot of the SATA controller.

I know that the volume I want to mount has an ntfs filesystem with about 1TB of stuff on it. I therefore used dnf to add ntsf support. I then shutdown the system, installed the second volume in the second slot, and attempted to start the system.

Here are the specific commands I issued (as root) to add ntfs support:

dnf -y update
dnf -y install ntfs-3g
dnf install ntfsprogs -y
shutdown -r now

All seemed fine.

The system now fails to mount /mnt/internal_hd0 on startup. On the console, I see a complaint that says:

[ Time ] Timed out waiting for device dev-disk-byx2duuid-1f97ecf3x2d71dbx2d43c4x2d82c3x2d9d4750354b4b.device.

It then boots into “emergency mode”.

Is this a hardware or software issue? I note that /etc/fstab mounts /mnt/internal_hd0 by UUID, and the UUID is:

1f97ecf3-71db-43c4-82c3-9d4750354b4b

In “emergency mode”, I’m able to examine /dev/disk/by-uuid. When I do, I see three entries – none of which match the UUID specified in /etc/fstab.

I thought that the UUID was not supposed to EVER change! What happened?

I’ve now removed the two packages:

dnf remove ntfsprogs
dnf remove ntfs-3g

This had no apparent effect.

Is this a hardware or software failure?

I’d like suggestions about how to:

  1. Get the system running again (hopefully without trashing the contents of /mnt/internal_hd0)
  2. Mount an ntsf volume (perhaps on /mnt/internal_hd1)

Here is an update with some additional information.

Apparently one or both of ntfs-3g and ntfsprogs breaks my SATA controller, so that it doesn’t see any volume inserted into any drive bay.

I was able to recover the system by replacing /etc/fstab with the prior version that doesn’t attempt to mount /mnt/internal_hd0. I then powered down the system, removed the physical volume, and powered up the system. All was fine.

I then powered down the system, inserted the physical volume into the slot, and powered up. The system again booted cleanly. In this state, I was able to see 1f97ecf3-71db-43c4-82c3-9d4750354b4b in /dev/disk/by-uuid – it is linked to /dev/sda. I’m also able see the UUID and volume in lsblk.

I then restored /etc/fstab to the version that mounts the above UUID on /mnt/nternal_hd0.

All seems fine.

So the system is running again, answering my question 1 above.

I’ll open a new topic for question 2 – how to mount an existing ntsf physical volume on RL v8.6.

UUID’s are unique to each file system on each partition that is formatted and not to the mount point. Since this thread has the relevant information I am responding to it rather than the old post that has nothing to do with your issue other than getting the appropriate modules installed.
You can’t substitute one drive for another on the same mount point recorded in fstab as you found out. Also you don’t mount drives you mount partitions on the drive.

I have a current Fedora 35 system that I mount an ntfs formatted partition I can use as an example.

If you are not going to keep using the xfs formatted partition you have mounted at

/mnt/nternal_hd0
then you need to remove or comment out this line in fstab before you disconnect it. If on the other hand you are just adding another drive on a different connector with the ntfs partition and maintaining usage of the xfs partition then for right now you do not need to touch the fstab file. I’m going to assume the latter.
Shutdown and connect the drive with the ntfs partition on it and then restart.
Run lsblk thus:

lsblk -o name,size,type,fstype,uuid,mountpoints
$ lsblk -o name,size,type,fstype,uuid,mountpoints | grep ntfs
├─sda6   5.9G part ntfs   462CBBAE47405341       /data/finjk

Note the unique UUID starting with 46 and save in text file for use when editing fstab for your new mount point.
Create a new mount point

/mnt/internal_hd1
then mount the ntfs partition

mount -t ntfs /dev/sdx1 /mnt/internal_hd1

substitute the “x” for the drive letter noted in the output of lsblk. Now if you’ve gotten this for without errors then we can edit the fstab file for the new entry. Here is the entry in my fstab:

UUID=462CBBAE47405341                     /data/finjk             ntfs    defaul
ts        0 0

Before editing the fstab copy the current one to fstab.orig so that you have a handy backup. You seem to have done that but this is for others.

Hope this helps

I appreciate this quick and helpful response.

Let me try to clarify my issue. I am under the perhaps mistaken impression that Rocky Linux v8.6 does not have the ability to handle ntfs as the filesystem type in the fstab entry you offer.

That’s why I started by attempting to load the two packages using dnf.

It looks to me as though when I installed those two packages, they broke my SATA controller so that it would not spin up or see ANY of the volumes mounted in any drive bay.

Hence, the drive with the UUID specified in fstab didn’t exist – and hence the system would not boot cleanly.

In fact, the UUID didn’t change (as I feared in my original post) – it simply wasn’t present at all.

I see now that I neglected to append ,nofail to the default fstab entry for /mnt/internal_hd0 – that likely contributed to the issue.

I’ll try your fstab entry, but I’m pretty sure that I cannot successfully use ntfs as the filesystem type in RockyLinux v8.6 (hence the new topic that I have not yet created).

Support for a filesystem type should not affect any devices; any volume/block device could have a particular filesystem. There is something else going on in your setup.

That UUID is specific to the xfs file system on the drive in the first bay. If you try to apply it to a different file system on another partition on a different drive your system is going to crash as happened. You were lucky to get to emergency mode I’ve had instances where I could not even boot w/o a rescue disk to get me back in the system.
The fact that you installed some software was a coincidence just by the sequence of events as captured in the quote above.

So does this appear to be a transient hardware issue?

I thought/hoped that when I added ntfs-3g or ntfsprogs, it installed a driver or something similar that was incompatible with my aging hardware.

The system is installed on an SSD mounted on the SCSI controller and that was unaffected. The crash happened because the entry for the failing UUID in fstab did not have the “,nofail” option specified.

So the failure mechanism was that the SATA controller refused to load anything, causing the UUID to be absent, causing the attempted mount to fail, dumping me into emergency mode.

I’ll try repeating the whole process and see if the failure recurs. It it does, then I’ll blame the software. If I see no issues, then I’ll assume it was a transient HW issue.

Here is another and perhaps more interesting update.

It appears that there is a hardware issue with the second of the four hot-swap bays. The problem recurred when I started the working system while having the volume inserted in the second bay – the SATA controller apparently failed altogether.

I then shut down the system and moved the volume from the second to the third bay.

After restart, the system booted normally. The (unmounted) new volume is visible in lsblk and blkid as follows:

# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0   9.1T  0 disk /mnt/internal_hd0
sdb           8:16   0 931.5G  0 disk 
└─sdb1        8:17   0 931.5G  0 part 
sdc           8:32   0   3.7T  0 disk 
├─sdc1        8:33   0     1M  0 part 
├─sdc2        8:34   0     1G  0 part /boot
└─sdc3        8:35   0   3.7T  0 part 
  ├─rl-root 253:0    0    70G  0 lvm  /
  ├─rl-swap 253:1    0   5.9G  0 lvm  [SWAP]
  └─rl-home 253:2    0   3.6T  0 lvm  /home
sr0          11:0    1  1024M  0 rom  
# blkid
/dev/sda: UUID="1f97ecf3-71db-43c4-82c3-9d4750354b4b" BLOCK_SIZE="4096" TYPE="xfs"
/dev/sdb1: BLOCK_SIZE="512" UUID="6EC475DDC475A84B" TYPE="ntfs" PTTYPE="atari" PARTUUID="2cc3175b-01"
/dev/sdc1: PARTUUID="8f242cdb-729c-4489-80d0-54ffd8cf8e33"
/dev/sdc2: UUID="3f9b52c8-a9ee-4fce-86ed-259b27e80f05" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="b63aaa00-689e-40b9-9db3-2771bb52fc65"
/dev/sdc3: UUID="ihHjYm-FmVM-YNE8-X1ji-20Kz-MzsK-FtAB9b" TYPE="LVM2_member" PARTUUID="2371a503-64a8-4f76-b0c7-d461f77e1df5"
/dev/mapper/rl-root: UUID="e4c65b15-594e-41c5-8fb8-b282187c4edb" BLOCK_SIZE="512" TYPE="xfs"
/dev/mapper/rl-swap: UUID="623d3b0a-6bcd-4658-a4e3-8e3e15797785" TYPE="swap"
/dev/mapper/rl-home: UUID="eafb0bee-cc06-43de-be37-7d7879b7a938" BLOCK_SIZE="512" TYPE="xfs"

The new drive appears in lsblk as sdb and sdb1. From blkid, the new partition has a UUID of 6EC475DDC475A84B and type of ntsf (as expected).

This hardware ran as a windows system for a decade, and during those years I kept this volume in the third bay. I’m curious about whether anything will work in the second bay or whether perhaps some hidden Microsoft magic causes these physical drives to work only in the same that they occupied when formatted.

Anyway, it appears that the specific issue that motivated this topic is a hardware issue and not affected by either ntsf or by either of the two packages (ntfs-3g or ntfsprogs).

One last question – is the text that scrolls by the console during start-up logged someplace? I’m not even sure what tool produces it – this is text that scrolls by prior to the screen that lets me choose which Linux to launch.

I appreciate the attention of all parties!

Linux doesn’t really care about it. If you use UUID to mount the partition, then it can be in any slot, and it will still mount, be it /dev/sdbX /dev/sdcX etc. Obviously if /etc/fstab has /dev/sdb1 to mount, and then you move it to a different sata port which makes it /dev/sdc1, then it will not boot, and you’ll need to rescue at that point - similar to what you had already.

Yes, should be in /var/log/messages, and you can also use:

dmesg

from the command line to view stuff.

I appreciate the pointers to /var/log/messages and dmesg.

Indeed, I’m familiar with the semantics of /etc/fstab and mount (that’s why I prefer to use UUID in /etc/fstab).

The drive in question was formatted years ago from some windows tool (I don’t remember which). The original BIOS on this system (built in 2010) supported hot-swap on these four bays – when the system was new, it was possible to literally open the door, pull out the still-spinning drive, insert another, and close the door. The (Windows XP) system would recognize the change and adjust to the new drive. I had to update the BIOS for other reasons sometime around 2014-2015 and the new BIOS did not allow hot-swap.

Since then, the procedure to change a volume has been to power down the system, change the volume, and power the system back up. In practice, this particular volume was always inserted in the third drive bay. I don’t remember whether I bought it before or after the BIOS update.

I’ll do some more experimentation to see if any volume at all works in the second bay, and I’ll also try the fourth bay to see if it works. I suspect there’s an issue with the data or power cables on the second bay – that’s the next thing I’ll check.

As always, I appreciate your attention and patience.

1 Like

Your further clarifications clear up my misunderstanding of your issue.

I think this text is specific to your hardware bios as I never see any text scroll by prior to the grub menu. On my Lenovo there is a menu for Lenovo’s diagnostic software that I can access with the F12 key and something like that might exist in your bios. If there is any preboot information to be gleaned it would be a utility like that.
For post boot logs you can use the journalctl utility as root to explore any and all log information. The man page covers all the options. Some I use are:

journalctl -b

This displays the current boot messages in a “less” paging format. Adding a space and numeral 1 after the “-b” displays the previous boot’s messages and incrementing it will display each prior boot.

journalctl -u NetworkManager.service 

Will display messages specific to that service.

Good luck in your further diagnostics