Hi all,
I am using xfs on rocky linux 8. I have three bind mount under same folder. I continue to meet the XFS internal error about each 24h. Everytime, I have to unmount, xfs-repair,mount to fix this. which will interrupt my service and lead to data loss.
Would anyone have any best practice or slution to fix this ? thanks.
Here is some basic info:
– xfs on /folder (1.5T total), with three bind mount on /folder/a (~ 440G), on /folder/b ( ~ 101G) , on /folder/c (~ 10G)
– os release: rocky linux 8, xfs version: xfsprogs.x86_64 5.0.0-12.e18
– error log:
Jan 2 13:52:05 hybrid01 kernel: XFS (vdc): Internal error xfs_trans_cancel at line 957 of file fs/xfs/xfs_trans.c. Caller xfs_free_file_space+0x174/0x280 [xfs]
Jan 2 13:52:05 hybrid01 kernel: CPU: 13 PID: 2337066 Comm: dir /sensorsdat Tainted: G W OE -------- - - 4.18.0-553.16.1.el8_10.x86_64 #1
Jan 2 13:52:05 hybrid01 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Jan 2 13:52:05 hybrid01 kernel: Call Trace:
Jan 2 13:52:05 hybrid01 kernel: dump_stack+0x41/0x60
Jan 2 13:52:05 hybrid01 kernel: xfs_trans_cancel+0xad/0x130 [xfs]
Jan 2 13:52:05 hybrid01 kernel: xfs_free_file_space+0x174/0x280 [xfs]
Jan 2 13:52:05 hybrid01 kernel: xfs_file_fallocate+0x14a/0x480 [xfs]
Jan 2 13:52:05 hybrid01 kernel: vfs_fallocate+0x140/0x280
Jan 2 13:52:05 hybrid01 kernel: ioctl_preallocate+0x93/0xc0
Jan 2 13:52:05 hybrid01 kernel: do_vfs_ioctl+0x626/0x690
Jan 2 13:52:05 hybrid01 kernel: ? syscall_trace_enter+0x1ff/0x2d0
Jan 2 13:52:05 hybrid01 kernel: ksys_ioctl+0x64/0xa0
Jan 2 13:52:05 hybrid01 kernel: __x64_sys_ioctl+0x16/0x20
Jan 2 13:52:05 hybrid01 kernel: do_syscall_64+0x5b/0x1a0
Jan 2 13:52:05 hybrid01 kernel: entry_SYSCALL_64_after_hwframe+0x66/0xcb
Jan 2 13:52:05 hybrid01 kernel: RIP: 0033:0x7f17bb01522b
Jan 2 13:52:05 hybrid01 kernel: Code: 73 01 c3 48 8b 0d 5d 6c 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 6c 39 00 f7 d8 64 89 01 48
Jan 2 13:52:05 hybrid01 kernel: RSP: 002b:00007f179dd70138 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 2 13:52:05 hybrid01 kernel: RAX: ffffffffffffffda RBX: 00000000222d5ac0 RCX: 00007f17bb01522b
Jan 2 13:52:05 hybrid01 kernel: RDX: 00007f179dd70190 RSI: 000000004030582b RDI: 00000000000098e2
Jan 2 13:52:05 hybrid01 kernel: RBP: 00007f179dd701f0 R08: 864fd6762cc61123 R09: 0000000068f54c02
Jan 2 13:52:05 hybrid01 kernel: R10: 0000000000000050 R11: 0000000000000246 R12: 0000000006f93000
Jan 2 13:52:05 hybrid01 kernel: R13: 0000000000003000 R14: 0000000003c23624 R15: 00007f179dd702d8
Jan 2 13:52:05 hybrid01 kernel: XFS (vdc): Internal error xfs_trans_cancel at line 957 of file fs/xfs/xfs_trans.c. Caller xfs_free_file_space+0x174/0x280 [xfs]
Jan 2 13:52:05 hybrid01 kernel: CPU: 1 PID: 2337071 Comm: dir /sensorsdat Tainted: G W OE -------- - - 4.18.0-553.16.1.el8_10.x86_64 #1
Jan 2 13:52:05 hybrid01 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Jan 2 13:52:05 hybrid01 kernel: Call Trace:
Jan 2 13:52:05 hybrid01 kernel: dump_stack+0x41/0x60
Jan 2 13:52:05 hybrid01 kernel: xfs_trans_cancel+0xad/0x130 [xfs]
Jan 2 13:52:05 hybrid01 kernel: xfs_free_file_space+0x174/0x280 [xfs]
Jan 2 13:52:05 hybrid01 kernel: xfs_file_fallocate+0x14a/0x480 [xfs]
Jan 2 13:52:05 hybrid01 kernel: ? futex_wake+0x144/0x160
Jan 2 13:52:05 hybrid01 kernel: vfs_fallocate+0x140/0x280
Jan 2 13:52:05 hybrid01 kernel: ioctl_preallocate+0x93/0xc0
Jan 2 13:52:05 hybrid01 kernel: do_vfs_ioctl+0x626/0x690
Jan 2 13:52:05 hybrid01 kernel: ? syscall_trace_enter+0x1ff/0x2d0
Jan 2 13:52:05 hybrid01 kernel: ksys_ioctl+0x64/0xa0
Jan 2 13:52:05 hybrid01 kernel: __x64_sys_ioctl+0x16/0x20
Jan 2 13:52:05 hybrid01 kernel: do_syscall_64+0x5b/0x1a0
Jan 2 13:52:05 hybrid01 kernel: entry_SYSCALL_64_after_hwframe+0x66/0xcb
Jan 2 13:52:05 hybrid01 kernel: RIP: 0033:0x7f17bb01522b
Jan 2 13:52:05 hybrid01 kernel: Code: 73 01 c3 48 8b 0d 5d 6c 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 6c 39 00 f7 d8 64 89 01 48
Jan 2 13:52:05 hybrid01 kernel: RSP: 002b:00007f17b3fae138 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 2 13:52:05 hybrid01 kernel: RAX: ffffffffffffffda RBX: 0000000018ff49e0 RCX: 00007f17bb01522b
Jan 2 13:52:05 hybrid01 kernel: RDX: 00007f17b3fae190 RSI: 000000004030582b RDI: 00000000000071a6
Jan 2 13:52:05 hybrid01 kernel: RBP: 00007f17b3fae1f0 R08: ad732b924664055e R09: 00000000bb9703a7
Jan 2 13:52:05 hybrid01 kernel: R10: 0000000000000050 R11: 0000000000000246 R12: 0000000007be1000
Jan 2 13:52:05 hybrid01 kernel: R13: 0000000000003000 R14: 0000000003c23624 R15: 00007f17b3fae2d8
Jan 2 13:52:05 hybrid01 kernel: XFS (vdc): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0xc6/0x130 [xfs] (fs/xfs/xfs_trans.c:958). Shutting down filesystem
Jan 2 13:52:05 hybrid01 kernel: XFS (vdc): Please unmount the filesystem and rectify the problem(s)
Jan 2 13:52:05 hybrid01 systemd[1]: Started Process Core Dump (PID 2337089/UID 0).
Jan 2 13:52:05 hybrid01 systemd-coredump[2337090]: Resource limits disable core dumping for process 3179881 (replica_server).
Jan 2 13:52:05 hybrid01 systemd-coredump[2337090]: Process 3179881 (replica_server) of user 7001 dumped core.