Significant performance drop moving from CentOS to Rocky

Hi,
I’m running an NFS server using nfs-ganesha. I used to run this on CentOS 7, and recently moved to run it on Rocky (8.8) and I’m experiencing a significant performance drop- Number of operations per seconds drop by about 10-15%.

This workload has a very high CPU and network usage. These are mpstat results while running the workload:
Centos:

04:02:01 PM  CPU    %usr   %nice  %sys   %iowait    %irq   %soft  %steal  %guest  %gnice   %idleAverage:          
             all    54.08  0.06   26.98   0.04       0.00  13.65   0.00    0.00    0.00        5.19

Rocky:

03:42:10 PM  CPU    %usr   %nice    %sys    %iowait  %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    47.82   0.09    29.43     0.05    2.17  12.77   0.00    0.00    0.00     7.67

We can see that for Rocky, we spend about 7% less on user space. Sys and idle are slightly higher and that irq usage went from exactly 0 to ~2%.

To me it looks like something changed with IRQ handling that probably also increases the kernel CPU usage. Anyone has any idea what this could be?

Note that I’m running this workload on cloud VMs. The VMs themselves are completely identical in both cases.

Is there some reason you’re running genesha, as opposed to the nfs server built into Rocky?

Yes, there is. I’m not exporting a local directory via NFS. I’m trying to export a remote service that can’t be mounted, and Ganesha allows to easily do that with its FSAL model.

But, that is not really the issue, since I see poor performance regardless of what I’m exporting with Ganesha (I see it even if I export local dir).

The readme for nfs-ganesha says that it runs in user-mode, so what’s with all the system cpu usage?

I assume this is usage by the kernel network stack, since this workload has a high networking load. But I don’t know how to verify this.

Specifically what seems odd to me is that %irq changed from 0 to something, this seems like a change in behavior between OS versions.