Ok there are a lot of things going on in this thread, let me see what I can do to help with willing community members for bisecting (I’m waiting on hardware procurement).
-
There is an issues with various versions of Intel NICs (i210, 82574, E810) where on Rocky 9.6 which didn’t happen on 9.5.
- Using the Intel OOT driver 2.3.1 https://github.com/intel/ethernet-linux-ice the RX issue goes away.
-
Iwalker has a kernel panic on boot, I’m not going to chalk this up to the above specific issue but the ELRepo kernel-lt is 6.1 and is both ahead and behind 9.6 depending on subsystems.
- If you have the kernel panic stack trace from that I’m happy to look at it.
So digging around a little bit in this the ICE driver in the 9.6 kernel was synced to Linus 6.12 on Dec 11th 2024 https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5841#note_65103814ea8888e03a4ad04bc49c303d18519230 . Since then there have been 40 additional commits to the ICE driver. CIQ maintains a git tree best effort reconstruction of the rocky9.6 kernel git history here: https://github.com/ctrliq/kernel-src-tree/tree/rocky9_6
[jmaple@devbox ice]$ git log --oneline f9bd69b20328..HEAD . | wc -l
36
[jmaple@devbox ice]$ cat ../../../../../ciq/ciq_backports/kernel-5.14.0-570.16.1.el9_6/rebuild.details.txt | grep ice: | wc -l
5
(yes this totals 41 but some changes we could not be cleanly cherry-picked from stream so they get basically glomed together in a “final patch” that is the actual rpmbuild -bp
of the src.rpm
(this commit https://github.com/ctrliq/kernel-src-tree/commit/171ceb527773dcc959c7cf4884f64cf77c0722d5 )
We know that the Intel OOT solves the the RX issue (or at least appears to do so), one of the issues is the OOT driver is is just blob dumps, and its hard to validate what is in that OOT driver versus what is in Linus’s tree, still on the LKML, or just not even yet pushed to the LKML. But its a good data point.
However to look at the code deeper, I need to understand what starts to work when and narrowing down the possible change sets. Grepping through the upstream change log there wasn’t something SUPER obvious with RX errors / drops that was net new changes (ie things without a FIXES line in the commit).
For those with this issue that can have resources for testing.
Since I don’t have access to this hardware yet, I’m needing to lean on those that have time and non-prod resources.
-
Can we try the latest of the following:
-
Centos-streams-9 kernel Index of /9-stream/BaseOS/x86_64/os/Packages
-
ELRepo kernel-ml https://elrepo.org/wiki/doku.php?id=kernel-ml
-
Thanks, Look forward to being able to help out where I can.