Kernel errors for Rocky 9

We have Rocky 9.3 across all Dev, stage and prod env on vmware. We cannot upgrade them to 9.4 for now as there is product dependency. There are kernel errors across our work environment on about 400 servers. One after the other servers are going down frequently.

Cause is unknown and investigation is still on. Issues are specific to Rocky servers and no windows servers are impacted. There is no conclusion what these errors are actually related to. I’m posting some of the points that can be looked from OS point of view.

  1. Kernel I/O error
  2. Memory and CPU hangs
  3. Unix Drivers any to be installed?
  4. Can VMtools be upgraded to latest version without OS upgrade from 9.3 to 9.4.

Screenshots are provided with the errors and disk info. It would be helpful if there is anything underlying issue with kernel.





Rocky 9.4 is the only supported version, Rocky 9.3 is no longer supported. If the kernel is the issue, then you have to upgrade them since that kernel is now defunct and not getting updates. Or any of the other packages in 9.3.

How real is the dependency and what sane vendor would bind to platform (RHEL 9.3) that is known to live only six months?

The screenshots are horrible.

You can paste console output into the forum instead, which will then show clear text.

Many of the errors are complaining about disk sectors, but you also have ESET running, and you need to check if ESET is blocking I/O.

1 Like

Best answer.

Also he should try
smartctl -H /dev/sda , or smartctl --health /dev/sda

But eset seems to be the answer

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-362.24.1.el9_3.0.1.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

ESET scan errors when there is server hung or when files are unavailable for the scan.

Some of the features for Application is not working as expected in 9.4 and engineering is working on moving them to 9.4 soon once they have the stability. Until then we have to live with 9.3.

BUG: soft lock up CPU#0 stuck for 32 sec

This error is also seen on the console and the logs… Is this a known bug with this kernel version?

Getting the below OOM errors in logs. How do we install Pressure Stall Information (PSI) feature in Kernel and is it mandatory to enable PSI in Rocky?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.