Hi
Since the release of 9.7, I’ve been struggling with what I think is a kernel bug
Back on 9.6, with kernel 5.14.0-570.58.1.el9_6.x86_64, listing my device with lscpi showed many capabilities:
lspci -Dvs '0000:02:00.0'
0000:02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
Subsystem: Netronome Systems, Inc. Device 0611
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 26, NUMA node 0, IOMMU group 51
Memory at e0000000 (64-bit, prefetchable) [size=128M]
Memory at e8000000 (64-bit, prefetchable) [size=64M]
Memory at ec000000 (64-bit, prefetchable) [size=16M]
Expansion ROM at f4000000 [disabled] [size=16M]
Capabilities: [80] Power Management version 3
Capabilities: [b0] MSI-X: Enable+ Count=256 Masked-
Capabilities: [c0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Alternative Routing-ID Interpretation (ARI)
Capabilities: [150] Device Serial Number 00-15-4d-13-1a-b8-10-ff
Capabilities: [200] Single Root I/O Virtualization (SR-IOV)
Capabilities: [300] Secondary PCI Express
Kernel driver in use: nfp
Kernel modules: nfp
But now on Rocky Linux 9.7 with kernel 5.14.0-611.11.1.el9_7.x86_64
lspci -Dvs '0000:02:00.0'
0000:02:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
Subsystem: Netronome Systems, Inc. Device 0611
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 26, NUMA node 0, IOMMU group 51
Memory at e0000000 (64-bit, prefetchable) [size=128M]
Memory at e8000000 (64-bit, prefetchable) [size=64M]
Memory at ec000000 (64-bit, prefetchable) [size=16M]
Expansion ROM at f4000000 [disabled] [size=16M]
Capabilities: [80] Power Management version 3
Capabilities: [b0] MSI-X: Enable+ Count=256 Masked-
Capabilities: [c0] Express Endpoint, MSI 00
Kernel driver in use: nfp
Kernel modules: nfp
What are the proper channels to report this?
If I understand correctly upstream would be Redhat and/or CentOS stream 9 where this originates from? I don’t have an Redhat account so I’m not sure if there is avenue for me to report this?
I spent quite a bit of time digging into the bug, so I’ll list my findings here in case someone from Rocky Linux can reproduce or help report upstream.
I did a bisect of the kernels in this tree: Red Hat / centos-stream / src / kernel / centos-stream-9 · GitLab
The breakage happens on this commit:
In fs/sysfs/group.c, the (*bin_attr)->size is stored in the size variable and passed to sysfs_add_bin_file_mode_ns
However, line 83/84 has a side effect of modifying (*bin_attr)->size so size was copied too early before the modified value.
Upstream, this was fixed by this commit:
However this has not been cherry-picked onto the CentOS Stream 9 repo
Any advice appreciated.
Regards,
Pico