I am using a 1T NVME disk as a fast cache of a logical volume:
[root@host ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root rl Cwi-aoC--- <14.52t [cache_cpool] [root_corig] 100.00 0.11 20.20
swap rl wi -ao---- 32.00g
[root@host ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/md126 rl lvm2 a-- 14.55t 0
/dev/nvme0n1p1 rl lvm2 a-- <931.51g <11.51g
It works fine when the cache-Data is small, such as ~20%. However, when its near full, ~100%, I can see file system errors just after tens hours of xfs_repair.
[root@host ~]# du -hs
du: cannot access './file/path1/path2/filename1': Structure needs cleaning
du: cannot access './file/path1/path2/filename2': Structure needs cleaning
Which is requring of xfs_repair again. This is a big problem.
I checked the err-log by nvme command,
[root@host ~]# nvme error-log /dev/nvme0|head
Error Log Entries for device:nvme0 entries:64
.................
Entry[ 0]
.................
error_count : 76
sqid : 0
cmdid : 0x17
status_field : 0x2109(Invalid Log Page: The log page indicated is invalid)
phase_tag : 0
parm_err_loc : 0x28
The error_count seems show that this nvme disk has some errors.
Testing:
I had try to rebulld the lvm-cache. It seems works fine still when the cache-Data is small, but not when its nearly full.
I had try to use another new nvme disk, approximate highest quality consumer nvme disk of Samsung , there are still problems of this kind.
Solutions?
I was wondering if it would be possible to limit the usage of the nvme disk to a maximum of 30-40% storage, and possibly be able to make the nvme works in MLC mode (or SLC mode by just using <25% storage of the disk) instead of TLC, which would drastically improve data reliability. Thus less prone to file system errors.
Question:
So, any way to setting the lvm-cache not to using the whole space of cache disk?