We have few instances of servers crashing with high I/O load running on rocky 9.x wherein the application is using NFS share and the CPU load builds up over time ( about 20-30 mins ) causing the server to go unresponsive and ultimately causing outage(s)
Hardware: HPE Gen 8 servers // 256G RAM // 48 Core CPU // swap disabled
Role: Application servers running containerd with K8S
Behaviour:
We are seeing the application getting hung due to heavy I/O causing CPU load to go up 600+ and server get unresponsive.
This seems to be happening under high load from application and same application setup works fine on centos servers ( we are in transition phase )
atop suggests the load building up due to high read / write on NFS mount and also causing local disk to have high utilization, we are trying to get some help if this may be related to tuning / config issue
10 Min before outage

