We have few instances of servers crashing with high I/O load running on rocky 9.x wherein the application is using NFS share and the CPU load builds up over time ( about 20-30 mins ) causing the server to go unresponsive and ultimately causing outage(s) Hardware: HPE Gen 8 servers // 256G RAM // 48 Core CPU // swap disabled Role: Application servers running containerd with K8S
Behaviour:
We are seeing the application getting hung due to heavy I/O causing CPU load to go up 600+ and server get unresponsive.
This seems to be happening under high load from application and same application setup works fine on centos servers ( we are in transition phase )
atop suggests the load building up due to high read / write on NFS mount and also causing local disk to have high utilization, we are trying to get some help if this may be related to tuning / config issue
OK, so java web app would normally use threading for multiple requests, but your output shows multiple java processes? Are they all “the application”, or are there actually multiple applications? If you stop “the application” (or one of them), does the i/o problem go away? I’m guessing they all have the same issue and they’re cumulative?
They appear to be cumulative. NFS seems to be written with too much data , we are not sure whether the application is pushing too much data or some sort of buffer is getting full and causing outage ultimately