Hello everybody,
today I updated a server to Rocky Linux 9.5 and in the boot process (which hang for around 1,5 minutes on one task) I saw some errors that worried me. After running dmesg I saw that this error is related to infiniband, but the server doesn’t use infiniband. I have a 10G Ethernet card, and I have read that this card also support infiniband but the default should be Ethernet mode.
The dmesg output is:
[ 50.201717] bnxt_en 0000:43:00.0 bnxt_re0: Failed to modify HW QP
[ 50.201741] infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
[ 50.201764] infiniband bnxt_re0: Couldn't start port
[ 50.202910] bnxt_en 0000:43:00.0 bnxt_re0: Failed to destroy HW QP
[ 50.202972] ------------[ cut here ]------------
[ 50.202987] WARNING: CPU: 1 PID: 1437 at drivers/infiniband/core/cq.c:322 ib_free_cq+0xf2/0x130 [ib_core]
[ 50.203018] Modules linked in: ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm bnxt_re(+) ast i2c_algo_bit wmi_bmof ib_uverbs acpi_cpufreq pcspkr rapl drm_shmem_helper acpi_ipmi ses ipmi_si enclosure ch drm_kms_helper ib_core ipmi_devintf k10temp i2c_piix4 ptdma ipmi_msghandler joydev drm nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c sd_mod raid1 crct10dif_pclmul crc32_pclmul crc32c_intel mpt3sas ahci nvme libahci raid_class ghash_clmulni_intel scsi_transport_sas bnxt_en nvme_core libata ccp nvme_auth t10_pi sp5100_tco wmi rndis_host cdc_ether usbnet mii zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) st sg fuse
[ 50.203075] CPU: 1 PID: 1437 Comm: systemd-udevd Tainted: P OE ------- --- 5.14.0-503.15.1.el9_5.x86_64 #1
[ 50.203078] Hardware name: Supermicro AS -1114S-WN10RT/H12SSW-NTR, BIOS 2.7 10/25/2023
[ 50.203080] RIP: 0010:ib_free_cq+0xf2/0x130 [ib_core]
[ 50.203098] Code: 08 48 89 ee e8 1f 61 02 00 65 ff 0d 70 c9 ae 3e 75 81 0f 1f 44 00 00 e9 77 ff ff ff 48 8d 7f 50 e8 f3 ab 7e de e9 46 ff ff ff <0f> 0b e9 52 e0 5b df 0f 0b 5d e9 4a e0 5b df 80 3d d6 25 03 00 00
[ 50.203100] RSP: 0018:ffffb915c931b840 EFLAGS: 00010202
[ 50.203102] RAX: 0000000000000002 RBX: ffff9e4cc4a00000 RCX: 0000000000000000
[ 50.203104] RDX: 0000000000000000 RSI: ffff9e8a8e8608c0 RDI: ffff9e4c4fdbb000
[ 50.203105] RBP: ffff9e4c5bce6000 R08: 0000000000000000 R09: ffffb915c931b570
[ 50.203106] R10: ffffb915c931b568 R11: ffffffffa1de93e8 R12: 00000000ffffff92
[ 50.203107] R13: 0000000000000246 R14: ffff9e4c5bce68f8 R15: ffff9e4c5bce6870
[ 50.203109] FS: 00007fa84118ab40(0000) GS:ffff9e8a8e840000(0000) knlGS:0000000000000000
[ 50.203110] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 50.203112] CR2: 00007f567512e4e0 CR3: 000000014694a001 CR4: 0000000000770ef0
[ 50.203113] PKRU: 55555554
[ 50.203114] Call Trace:
[ 50.203118] <TASK>
[ 50.203119] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203124] ? show_trace_log_lvl+0x26e/0x2df
[ 50.203131] ? show_trace_log_lvl+0x26e/0x2df
[ 50.203137] ? ib_mad_port_open+0x267/0x3f0 [ib_core]
[ 50.203160] ? ib_free_cq+0xf2/0x130 [ib_core]
[ 50.203176] ? __warn+0x7e/0xd0
[ 50.203180] ? ib_free_cq+0xf2/0x130 [ib_core]
[ 50.203196] ? report_bug+0x100/0x140
[ 50.203201] ? handle_bug+0x3c/0x70
[ 50.203205] ? exc_invalid_op+0x14/0x70
[ 50.203207] ? asm_exc_invalid_op+0x16/0x20
[ 50.203212] ? ib_free_cq+0xf2/0x130 [ib_core]
[ 50.203228] ib_mad_port_open+0x267/0x3f0 [ib_core]
[ 50.203247] ib_mad_init_device+0x51/0xc0 [ib_core]
[ 50.203265] add_client_context+0x110/0x1b0 [ib_core]
[ 50.203284] enable_device_and_get+0xd7/0x1e0 [ib_core]
[ 50.203301] ib_register_device+0xe7/0x160 [ib_core]
[ 50.203319] bnxt_re_ib_init+0x143/0x160 [bnxt_re]
[ 50.203333] bnxt_re_probe+0x141/0x1b0 [bnxt_re]
[ 50.203342] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
[ 50.203349] auxiliary_bus_probe+0x45/0x80
[ 50.203353] ? driver_sysfs_add+0x59/0xc0
[ 50.203357] really_probe+0xe1/0x390
[ 50.203360] ? pm_runtime_barrier+0x50/0x90
[ 50.203363] __driver_probe_device+0xd6/0x130
[ 50.203367] driver_probe_device+0x1e/0x90
[ 50.203370] __driver_attach+0xd2/0x1c0
[ 50.203373] ? __pfx___driver_attach+0x10/0x10
[ 50.203375] bus_for_each_dev+0x78/0xd0
[ 50.203379] bus_add_driver+0xc2/0x1f0
[ 50.203383] driver_register+0x70/0xd0
[ 50.203386] __auxiliary_driver_register+0x6a/0xd0
[ 50.203389] ? __pfx_init_module+0x10/0x10 [bnxt_re]
[ 50.203397] bnxt_re_mod_init+0x3b/0xff0 [bnxt_re]
[ 50.203404] do_one_initcall+0x44/0x210
[ 50.203409] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203412] ? kmalloc_trace+0x25/0xa0
[ 50.203417] do_init_module+0x64/0x230
[ 50.203422] __do_sys_init_module+0x12e/0x1b0
[ 50.203428] do_syscall_64+0x5f/0xf0
[ 50.203434] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203436] ? __mod_memcg_lruvec_state+0x76/0xc0
[ 50.203441] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203443] ? __mod_lruvec_page_state+0x97/0x160
[ 50.203446] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203448] ? folio_add_new_anon_rmap+0x44/0xe0
[ 50.203452] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203454] ? do_anonymous_page+0x25a/0x410
[ 50.203457] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203460] ? __handle_mm_fault+0x2fb/0x690
[ 50.203463] ? nohz_balancer_kick+0x31/0x240
[ 50.203469] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203471] ? __count_memcg_events+0x4f/0xb0
[ 50.203472] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203474] ? mm_account_fault+0x6c/0x100
[ 50.203478] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203480] ? handle_mm_fault+0x116/0x270
[ 50.203482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203484] ? do_user_addr_fault+0x1d6/0x6a0
[ 50.203488] ? srso_alias_return_thunk+0x5/0xfbef5
[ 50.203490] ? exc_page_fault+0x62/0x150
[ 50.203493] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 50.203495] RIP: 0033:0x7fa841d0f01e
[ 50.203517] Code: 48 8b 0d fd 9d 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 9d 0e 00 f7 d8 64 89 01 48
[ 50.203518] RSP: 002b:00007ffd1917cf08 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 50.203520] RAX: ffffffffffffffda RBX: 0000561c2715fd40 RCX: 00007fa841d0f01e
[ 50.203521] RDX: 00007fa8423b932c RSI: 000000000006db86 RDI: 0000561c27a32aa0
[ 50.203522] RBP: 0000561c27a32aa0 R08: 0000561c27168a60 R09: 000000000006c010
[ 50.203523] R10: 0000000000000005 R11: 0000000000000246 R12: 00007fa8423b932c
[ 50.203524] R13: 0000561c27178a00 R14: 0000000000000007 R15: 0000561c27169b70
[ 50.203527] </TASK>
[ 50.203528] ---[ end trace 0000000000000000 ]---
[ 50.203531] bnxt_en 0000:43:00.0 bnxt_re0: Free MW failed: 0xffffff92
[ 50.230787] infiniband bnxt_re0: Couldn't open port 1
[ 50.231303] infiniband bnxt_re0: Device registered with IB successfully
[ 91.161675] bnxt_en 0000:43:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (40888 > 40000) msec active 1
[ 91.162408] bnxt_en 0000:43:00.1 bnxt_re1: Failed to modify HW QP
[ 91.163075] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[ 91.163553] infiniband bnxt_re1: Couldn't start port
[ 91.164692] bnxt_en 0000:43:00.1 bnxt_re1: Failed to destroy HW QP
[ 91.165179] bnxt_en 0000:43:00.1 bnxt_re1: Free MW failed: 0xffffff92
[ 91.165498] infiniband bnxt_re1: Couldn't open port 1
[ 91.166028] infiniband bnxt_re1: Device registered with IB successfully
[ 91.180881] XFS (md126): Mounting V5 Filesystem 2f004d25-f83e-4473-94e2-bf5d544112b5
[ 91.182397] XFS (md123): Mounting V5 Filesystem 4c75560e-87bf-499d-84d9-bdf8ea51c740
[ 91.193783] XFS (md123): Ending clean mount
[ 91.205241] md124:
[ 91.370594] XFS (md126): Ending clean mount
[ 96.216133] evm: overlay not supported
[ 96.378151] Warning: Unmaintained driver is detected: ip_set
[ 96.576894] bnxt_en 0000:43:00.0 eno1np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 96.577506] bnxt_en 0000:43:00.0 eno1np0: EEE is not active
[ 96.577878] bnxt_en 0000:43:00.0 eno1np0: FEC autoneg off encoding: None
[ 96.749368] bnxt_en 0000:43:00.1 eno2np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 96.750123] bnxt_en 0000:43:00.1 eno2np1: EEE is not active
[ 96.750600] bnxt_en 0000:43:00.1 eno2np1: FEC autoneg off encoding: None
[ 96.792985] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 96.881166] br0: port 1(bond0) entered blocking state
[ 96.881620] br0: port 1(bond0) entered disabled state
[ 96.882121] device bond0 entered promiscuous mode
[ 96.882529] br0: port 1(bond0) entered blocking state
[ 96.882885] br0: port 1(bond0) entered forwarding state
[ 96.883385] br0: port 1(bond0) entered disabled state
[ 96.971207] bnxt_en 0000:43:00.0 eno1np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 96.971893] bnxt_en 0000:43:00.0 eno1np0: EEE is not active
[ 96.972311] bnxt_en 0000:43:00.0 eno1np0: FEC autoneg off encoding: None
[ 96.973704] bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[ 96.973708] device eno1np0 entered promiscuous mode
[ 96.973711] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[ 96.973716] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
[ 96.973720] bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[ 96.973723] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[ 96.973725] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
[ 96.973783] bond0: (slave eno1np0): Enslaving as a backup interface with an up link
[ 97.030469] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.031235] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.031709] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.032142] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.032553] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.032970] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.072530] bnxt_en 0000:43:00.1 eno2np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 97.073327] bnxt_en 0000:43:00.1 eno2np1: EEE is not active
[ 97.073807] bnxt_en 0000:43:00.1 eno2np1: FEC autoneg off encoding: None
[ 97.074823] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.074881] device eno2np1 entered promiscuous mode
[ 97.075462] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.076088] bond0: (slave eno2np1): Enslaving as a backup interface with an up link
[ 97.076623] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.077471] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 97.077833] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.077836] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.078427] br0: port 1(bond0) entered blocking state
[ 97.078953] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.079463] br0: port 1(bond0) entered forwarding state
[ 97.080025] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.081499] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.081901] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.082298] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.082695] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=0
[ 97.083083] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:3eec:efff:fe97:7c62 error=-110
[ 97.083488] bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[ 97.083892] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[ 97.084283] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
[ 97.084687] bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[ 97.085081] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[ 97.085479] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
[ 97.085891] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.086295] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=2
[ 97.087151] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
[ 97.087659] bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
[ 97.088101] infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=2
[ 97.088534] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a0a:0013 error=-110
Card info:
# lspci | grep Ethernet
43:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
43:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
Loaded modules:
# lsmod | grep bnxt
bnxt_re 188416 0
ib_uverbs 208896 1 bnxt_re
ib_core 557056 6 rdma_cm,rpcrdma,iw_cm,bnxt_re,ib_uverbs,ib_cm
bnxt_en 425984 1 bnxt_re
ethtool output
# ethtool -i eno1np0
driver: bnxt_en
version: 5.14.0-503.15.1.el9_5.x86_64
firmware-version: 218.0.153.0/pkg 218.0.169.0
expansion-rom-version:
bus-info: 0000:43:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
RDMA config:
# rdma link
link bnxt_re0/1 state ACTIVE physical_state LINK_UP netdev eno1np0
link bnxt_re1/1 state ACTIVE physical_state LINK_UP netdev eno2np1
It looks like that the the network is running normal. I remember that in the server logs I had in the past also errors about infiniband (after a reboot), but because I don’t use this and network was working I never look closer to it.
Is this something I have to worry about, and can I fix this some how?
Have a good day!
Jonathan