To me the Avail column values are realistic, but the Size and Used ones are driving me nuts.
On the root of the file system I issued the following command. for d in $(ls) ; do du -sh $d 2>/dev/null; done
The output follows.
Sometimes processes can block disk space, until the service/process has been restarted. I’ve had such situations like that. Have you tried restarting the server to see if that 42GB becomes available?
Theres something else:
It shows /dev/nvme0n1p5 and /dev/nvme0n1p6 to have 64 *Zeta*bytes.
That cant’t be true, so I suspect that the partition table of your NVME disk is somehow corrupted.
Later:
Maybe not the partition table, but definitively the filesystem (superblock?). So you should
check, if lsblk and fdisk -l report correct partition sizes.
I respect the request of avoiding screenshots, though, as you can see, in this case upon pasting I get pipes as separation characters and I don’t know why. That was the reason why I resorted to screenshots.
So your partitions look ok.
A hunch (hopefully not true):
In the partition table, a Microsoft partition is on /dev/nvme0n1p4, just before the corrupted
linux partitions.
So:
Could it be this partition was bigger in the past, and when you installed Rocky,
you resized the partition but forgot to resize the filesystem (NTFS?) in it and you
ran Windows before those strange errors did occur?
If yes: You data is most likely overwritten (lost) and you know at least the reason why.
If no, I’m out of ideas unfortunately.
/dev/nvme0n1p4 is simply the partition I made to install Windows there. I partitioned the entire SSD disk from scratch. Windows was installed first and then I installed Rocky Linux in the space I left for it, presumably yet not partitioned at the time I installed Windows.
/dev/nvme0n1p5, /dev/nvme0n1p6 and /dev/nvme0n1p7 do have the size I could expected from there after partitioning. As for the no space left on device, it turned out it was probably due to space exhaustion in /var/tmp, that I configured as a temporary file system in RAM, but unfortunately with too much limitation in size - I could easily mend for it, once discovered the problem. Elsewhere, in spite of the 100% usage indication, I could successfully create huge dummy files with dd and no error raised.
I’m pretty much ignorant concerning such issues, but could it be just a SSD-related issue? Maybe something related to the way the SSD disk is controlled? I remember a setting in the BIOS configuration…
For some reason I decided to install the system again, in the same partitions seen before.
Mysteriously now the behavior is normal, as far as sizes are concerned.
First of all, when using ‘du’ use the ‘-x’ switch .
When a process uses a file, in /proc/{pid}/fd there is a “reference” to that file.
So imagine that an app writes to a log that is never rotated and that becomes 30G. Then an unexperienced user would ‘rm -f’ that log.
Yet, the space won’t be released until the process is dead.
In such scenarios use ‘lsof’ and search for deleted files. Once you find it, go to /proc/{pid}/fd/ and use ‘truncate -s 0 {number of the fd}’ to release the space without restarting the application.
P.S.: You can also copy deleted files by ‘cp {number of fd} /{new_destination}’.