High I/O load causing server to crash

We have few instances of servers crashing with high I/O load running on rocky 9.x wherein the application is using NFS share and the CPU load builds up over time ( about 20-30 mins ) causing the server to go unresponsive and ultimately causing outage(s)
Hardware: HPE Gen 8 servers // 256G RAM // 48 Core CPU // swap disabled
Role: Application servers running containerd with K8S
Behaviour:

We are seeing the application getting hung due to heavy I/O causing CPU load to go up 600+ and server get unresponsive.
This seems to be happening under high load from application and same application setup works fine on centos servers ( we are in transition phase )

atop suggests the load building up due to high read / write on NFS mount and also causing local disk to have high utilization, we are trying to get some help if this may be related to tuning / config issue

10 Min before outage

20 min ago atop ( i was not able to add two screenshot in description )

Define what it means by “the application”.

@gerry666uk We run a java based web application which runs on containerd + kuberenetes

The server gets hung and only solution is to force reboot

OK, so java web app would normally use threading for multiple requests, but your output shows multiple java processes? Are they all “the application”, or are there actually multiple applications? If you stop “the application” (or one of them), does the i/o problem go away? I’m guessing they all have the same issue and they’re cumulative?

They appear to be cumulative. NFS seems to be written with too much data , we are not sure whether the application is pushing too much data or some sort of buffer is getting full and causing outage ultimately

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.