Network TX drops now non-zero after update to RL9.6

onthebeatenpath · September 10, 2025, 12:52am

On multiple production systems with 5 network interfaces (Intel i210 NICs and also some systems with older Intel 82574 NICs), we are now seeing the output of netstat -in and ip -s link showindicating TX drops, like so:

$ netstat -in
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
e0               1150     2814      0      0 0            41      0      2      0 BMRU
e1               1500   400066      0      0 0       1263352      0      2      0 BMRU
e2               1500  1654728      0      0 0       1556547      0      3      0 BMRU
e3               1500  8938718      0      0 0        231107      0     68      0 BMRU
e4               1500        0      0      0 0             0      0  77844      0 BMU

e4 is not connected (ip addr shows NO-CARRIER). The other interfaces are all connected. e3 has an embedded device which has powered off and on a few times, and the TX drops increment when that happens.
e0 and e1 are connected to COTS switches.
e2 is connected to a FreeBSD host (which shows no drops, errors or collisions in its netstat output).

The small number of TX drops for e0, e1, e2 appeared early at some point during the time the interface was being initialized.

In RL9.5, these TX drops on the same physical hosts are consistently at 0 on all the systems (a dozen or so) where RL9.6 is showing drops.

I suspect this is just an accounting change, but I’d like to pin down what part of the stack changed (or what setting changed) that would trigger these drops.

ethtool -S <ifname> does not show any TX drops for i210, but tx_dropped values that match the netstat -in numbers for the older 82574 NIC.

kernel is 5.14.0-570.30.1.el9_6.x86_64
igb driver in use - all interfaces 1 Gbps.

First question… have anyone else here seen this issue firsthand (zero TX drops on 9.5, non-zero drops on 9.6)?

Next question… Does anyone have an inkling about what has changed in the network stack that could trigger this?

R_O_C_K_Y_L_I_N_U_X · September 10, 2025, 2:47pm

This is from Microsoft Copilot A.I. to dig a bit deeper into TX-DRP:

“Check interface stats with ifconfig e3 or ip -s link show e3 to see errors, collisions, or queue drops.”

because TX-DRP can be due to these four causes:

Network congestion
Interface misconfiguration
Hardware limitations
Driver or firmware issues

Hope this helps,

Tony

gerry666uk · September 10, 2025, 8:14pm

Looking at the last column where it says ‘TX-DR’, I only see numbers for e3 and e4 (you already explained why those two are dropping), but I only see a blank column for e0, e1, e2 so where are the drops?

onthebeatenpath · September 11, 2025, 9:50pm

I didn’t notice that netstat output in the original post was truncated. Not sure how that happened. Let me try again…

$ netstat -in
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
e0               1150     2814      0      0 0            41      0      2      0 BMRU
e1               1500   400066      0      0 0       1263352      0      2      0 BMRU
e2               1500  1654728      0      0 0       1556547      0      3      0 BMRU
e3               1500  8938718      0      0 0        231107      0     68      0 BMRU
e4               1500        0      0      0 0             0      0  77845      0 BMU

onthebeatenpath · September 11, 2025, 10:02pm

It’s not 1 or 3. Same hardware (inside and outside the box). Switching between RL9.5 and RL9.6 is ALWAYS 0 TX drops for RL9.5 and non-zero for RL9.6.

If it’s 2 (interface misconfig), it is a change in the underlying code (network stack) somehow. /etc files are the same.

It could certainly be 4 (but I’ll add: changes to kernel network code and/or userland tools like netstat(8) or ip(8)).

So I’m mainly leaning towards 4 and trying to figure out what changed between 9.5 and 9.6 that could be implicated. Hence the post here for help (and corroborating stories - positive or negative - from people that are using RL9.5 and RL9.6).

iwalker · September 12, 2025, 7:26am

I have an old Proliant DL380e with 9.6 kernels it crashes, and I’m stuck using 5.14.0-503.40.1.el9_5 on it which was the last 9.5 kernel release. So my situation is far worse than yours.

I expect if you still have a 9.5 kernel installed like the version above, you’ll probably find it also works fine, but as soon as you boot into a 9.6 kernel, the problems start. In the end I just use kernel-lt from elrepo instead - I get a 6.x kernel and much more hardware support than the default kernel in 9.x.

Obviously RHEL changed hardware support causing issues in 9.6.

Maybe try installing elrepo kernel-lt as well on yours and see if the problems go away:

dnf install epel-release
dnf config-manager --set-enabled elrepo-kernel
dnf install kernel-lt

jlehtone · September 12, 2025, 12:58pm

That is something that one would not expect from Enterprise Linux.

iwalker · September 12, 2025, 1:13pm

Yep, I tried three or four of the subsequent kernels that came out after and the effect was still the same. I gave up trying further ones after that, since the elrepo one worked. I could of course raise a bug report for it, but I’m not that entirely bothered about it. Either it wasn’t meant to happen and thus a regression, or it was intentional. I would have expected it to work at least until the EOL of EL9.

Since the server doesn’t support x86_64-v3, I’ll be retiring it in 2032 anyway assuming it doesn’t break before then.

R_O_C_K_Y_L_I_N_U_X · September 12, 2025, 7:29pm

In my earlier reply, I had suggested doing “ip -s link show e3” which might help you to obtain more information.

In my (fully up to date) Rocky 9.6 running on a Dell PowerEdge server, my “netstat-in” shows 21 RX-DRP (out of 549,188,830) but “ifconfig eno3” has more detail, and shows those 21 as actually “MISSED”.

Google Gemini states that “The missed statistic in the Linux command ip refers to packets that a network interface driver failed to process and dropped before they could be passed up the networking stack. This typically indicates a performance bottleneck where the system, often the CPU, isn’t fast enough to handle the incoming packet rate.”

You mention your computer is an old Proliant DL380e, which Gemini tells me was released circa 2012, could your “dropped” packets actually be “missed” packets due to CPU?

gerry666uk · September 12, 2025, 7:44pm

One possibility is that 9.5 is giving inaccurate results, and 9.6 is showing the real situation (unlikely but possible), so maybe try using something other than netstat, first on 9.5 and then on 9.6, some raw kernel counter or /proc filesystem, or some other tool.

sotrlm · September 16, 2025, 5:31pm

Had a similar issue with Rocky Linux 9.6 default / builtin kernel module for E810, very high rx_dropped rate after upgrading to 9.6 version. Downloaded vendor supplied rpm from https://www.intel.com/content/www/us/en/download/19630/intel-network-adapter-driver-for-e810-series-devices-under-linux.html issue has disappeared

gerry666uk · September 17, 2025, 7:33pm

The original poster was also using Intel, so could it be there was a change to the Intel driver in the 9.6 kernel?

R_O_C_K_Y_L_I_N_U_X · September 18, 2025, 2:59pm

Really interesting to see your charts, the y-axis labels are cut-off, is the typical throughput a constant 1,000 Mbits/second? Google Gemini lists these three reasons for input discard, and I’d guess it is the first reason, since the problem went away with the new RPM:

A full receive buffer: The NIC’s internal buffer is full and cannot store the incoming packet.
Invalid packet format: The packet is malformed and the NIC cannot properly read it.
Hardware limitations: The card is unable to process packets as fast as they are arriving due to hardware constraints.

So, maybe the new RPM is just faster at processing the receive buffers?
(and that would be helpful for the OP too, who stated having a circa 2012 laptop)

plaidcat · September 19, 2025, 7:46pm

Ok there are a lot of things going on in this thread, let me see what I can do to help with willing community members for bisecting (I’m waiting on hardware procurement).

There is an issues with various versions of Intel NICs (i210, 82574, E810) where on Rocky 9.6 which didn’t happen on 9.5.
- Using the Intel OOT driver 2.3.1 https://github.com/intel/ethernet-linux-ice the RX issue goes away.
Iwalker has a kernel panic on boot, I’m not going to chalk this up to the above specific issue but the ELRepo kernel-lt is 6.1 and is both ahead and behind 9.6 depending on subsystems.
- If you have the kernel panic stack trace from that I’m happy to look at it.

So digging around a little bit in this the ICE driver in the 9.6 kernel was synced to Linus 6.12 on Dec 11th 2024 https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5841#note_65103814ea8888e03a4ad04bc49c303d18519230 . Since then there have been 40 additional commits to the ICE driver. CIQ maintains a git tree best effort reconstruction of the rocky9.6 kernel git history here: https://github.com/ctrliq/kernel-src-tree/tree/rocky9_6

[jmaple@devbox ice]$ git log --oneline f9bd69b20328..HEAD . | wc -l
36

[jmaple@devbox ice]$ cat ../../../../../ciq/ciq_backports/kernel-5.14.0-570.16.1.el9_6/rebuild.details.txt | grep ice: | wc -l
5

(yes this totals 41 but some changes we could not be cleanly cherry-picked from stream so they get basically glomed together in a “final patch” that is the actual rpmbuild -bp of the src.rpm (this commit https://github.com/ctrliq/kernel-src-tree/commit/171ceb527773dcc959c7cf4884f64cf77c0722d5 )

We know that the Intel OOT solves the the RX issue (or at least appears to do so), one of the issues is the OOT driver is is just blob dumps, and its hard to validate what is in that OOT driver versus what is in Linus’s tree, still on the LKML, or just not even yet pushed to the LKML. But its a good data point.
However to look at the code deeper, I need to understand what starts to work when and narrowing down the possible change sets. Grepping through the upstream change log there wasn’t something SUPER obvious with RX errors / drops that was net new changes (ie things without a FIXES line in the commit).

For those with this issue that can have resources for testing.

Since I don’t have access to this hardware yet, I’m needing to lean on those that have time and non-prod resources.

Can we try the latest of the following:
- Centos-streams-9 kernel Index of /9-stream/BaseOS/x86_64/os/Packages
- ELRepo kernel-ml https://elrepo.org/wiki/doku.php?id=kernel-ml

Thanks, Look forward to being able to help out where I can.

system · November 18, 2025, 7:47pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bug in network driver Rocky Linux Help & Support	9	2895	August 24, 2023
Firewalld/NetworkManager Internet Routing Not working in Rocky Linux 9.x Rocky Linux Help & Support	20	16282	August 25, 2023
Rocky Linux 9 does not support Intel I219-LM Rocky Linux Help & Support	5	2584	August 25, 2023
Kernel errors releated to infiniband Rocky Linux Help & Support rocky-linux-9	3	1780	February 4, 2025
Unable to manage/activate network interface Rocky Linux Help & Support	20	15100	August 25, 2023

Network TX drops now non-zero after update to RL9.6

For those with this issue that can have resources for testing.

Related topics