Kvm kernel panic 4.18.0-372.32.1.el8_6.x86_64

I patched and rebooted my VM host (Rocky 8). And I noticed it panic’d.

Fortunately I have the kernel configured for a serial console and have conserver logging output, so I can capture it :slight_smile: I include a few lines from before the crash.

[   34.584466] NFSD: Using nfsdcld client tracking operations.
[   34.590146] NFSD: no clients to reclaim, skipping NFSv4 grace period (net f00
00098)
[   34.994431] br-lan: port 3(v-plexserver) entered blocking state
[   35.000483] br-lan: port 3(v-plexserver) entered disabled state
[   35.006549] device v-plexserver entered promiscuous mode
[   35.012092] br-lan: port 3(v-plexserver) entered blocking state
[   35.012383] int3: 0000 [#1] SMP PTI
[   35.012384] CPU: 2 PID: 3173 Comm: CPU 0/KVM Tainted: G          IOE    --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1
[   35.012385] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H55M Pro, BIOS P1.30 01/06/2010
[   35.012386] RIP: 0010:setno+0xa/0x10 [kvm]
[   35.012386] Code: db 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f 90 c0 e9 b8 9e 59 db cc cc cc cc cc cc cc cc 0f 91 c0 e9 a8 9e 59 db cc cc <cc> cc cc cc cc cc 0f 92 c0 e9 98 9e 59 db cc cc cc cc cc cc cc cc
[   35.012387] RSP: 0018:ffffa6b102673c20 EFLAGS: 00000286
[   35.012389] RAX: 0000000000000285 RBX: ffff8c3316f1bea0 RCX: ffffffffc0872657
[   35.012389] RDX: ffffffffc0867594 RSI: 0000000000000000 RDI: ffff8c3316f1bea0
[   35.012390] RBP: 0000000000000006 R08: ffff8c3316fd4000 R09: 0000000000000000
[   35.012391] R10: 0000000000000230 R11: 0000000000000001 R12: ffffffffc089ee20
[   35.012391] R13: 0000000000000000 R14: ffff8c3316f1bea0 R15: ffff8c3316fd4000
[   35.012392] FS:  00007f279b3fe700(0000) GS:ffff8c360fc80000(0000) knlGS:0000000000000000
[   35.012393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.012393] CR2: 0000000000000000 CR3: 0000000115908000 CR4: 00000000000026e0
[   35.012394] Call Trace:
[   35.012394]  ? x86_decode_emulated_instruction+0x5a/0x210 [kvm]
[   35.012395]  ? x86_emulate_instruction+0x2f2/0x560 [kvm]
[   35.012395]  ? emulator_pio_in+0x30/0x70 [kvm]
[   35.012396]  ? vmx_handle_exit+0x36d/0x7a0 [kvm_intel]
[   35.012396]  ? vcpu_enter_guest+0xabb/0x1730 [kvm]
[   35.012397]  ? vmx_set_rflags+0xb3/0x240 [kvm_intel]
[   35.012397]  ? x86_emulate_instruction+0x47b/0x560 [kvm]
[   35.012398]  ? vmx_vcpu_load+0x27/0x40 [kvm_intel]
[   35.012398]  ? kvm_arch_vcpu_ioctl_run+0xff/0x5f0 [kvm]
[   35.012399]  ? kvm_vcpu_ioctl+0x2cc/0x640 [kvm]
[   35.012399]  ? __handle_mm_fault+0x7aa/0x7f0
[   35.012400]  ? do_vfs_ioctl+0xa4/0x690
[   35.012400]  ? handle_mm_fault+0xc1/0x1e0
[   35.012401]  ? syscall_trace_enter+0x1ff/0x2d0
[   35.012401]  ? ksys_ioctl+0x64/0xa0
[   35.012401]  ? __x64_sys_ioctl+0x16/0x20
[   35.012402]  ? do_syscall_64+0x5b/0x1b0
[   35.012402]  ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
[   35.012403] Modules linked in: vhost_net vhost vhost_iotlb tap tun nfnetlink bridge 8021q garp mrp stp llc xfs dm_mod raid456 async_raid6_recov async_memcpy intel_powerclamp async_pq async_xor coretemp xor snd_hda_codec_via async_tx kvm_intel snd_hda_codec_generic ledtrig_audio kvm iTCO_wdt gpio_ich iTCO_vendor_support ppdev snd_hda_intel raid6_pq irqbypass libcrc32c intel_cstate intel_uncore snd_intel_dspcfg pcspkr snd_intel_sdw_acpi i7core_edac i2c_i801 snd_hda_codec btusb btrtl btbcm btintel snd_hda_core snd_hwdep snd_seq bluetooth snd_seq_device snd_pcm snd_timer ecdh_generic ftdi_sio rfkill snd lpc_ich soundcore parport_pc parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ext4 mbcache jbd2 raid1 sd_mod t10_pi sg drm_kms_helper syscopyarea ahci sysfillrect libahci sysimgblt fb_sys_fops firewire_ohci drm crc32c_intel libata serio_raw r8169 firewire_core crc_itu_t i2c_algo_bit mpt3sas(OE) realtek raid_class scsi_transport_sas
[   35.012442] ---[ end trace cc08601f572ecfb8 ]---
[   35.012462] RIP: 0010:setno+0xa/0x10 [kvm]
[   35.012465] Code: db 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f 90 c0 e9 b8 9e 59 db cc cc cc cc cc cc cc cc 0f 91 c0 e9 a8 9e 59 db cc cc <cc> cc cc cc cc cc 0f 92 c0 e9 98 9e 59 db cc cc cc cc cc cc cc cc
   35.012467] RSP: 0018:ffffa6b102673c20 EFLAGS: 00000286
[   35.012469] 
[   35.012471] RAX: 0000000000000285 RBX: ffff8c3316f1bea0 RCX: ffffffffc0872657
[   35.012472] RDX: ffffffffc0867594 RSI: 0000000000000000 RDI: ffff8c3316f1bea0
[   35.012474] RBP: 0000000000000006 R08: ffff8c3316fd4000 R09: 0000000000000000
[   35.012476] R10: 0000000000000230 R11: 0000000000000001 R12: ffffffffc089ee20
[   35.012478] R13: 0000000000000000 R14: ffff8c3316f1bea0 R15: ffff8c3316fd4000
[   35.012480] FS:  00007f279b3fe700(0000) GS:ffff8c360fc80000(0000) knlGS:0000000000000000
[   35.012482] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.012484] CR2: 0000000000000000 CR3: 0000000115908000 CR4: 00000000000026e0
[   35.012486] Kernel panic - not syncing: Fatal exception in interrupt
[   35.018096] Kernel Offset: 0x1a200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

For reference, I have everything on bridges; br-lan is my main network; v-plexserver is the virtual interface used by the plexserver VM that’s joined to the bridge. I have 6 VMs that auto-start and in multiple boots it’s seeing the VMs join the bridge in different orders, so the first boot after patching it was v-brains that was the last VM to join the bridge. On this crash v-mercury7 had joined 1 second earlier.

if I go back to kernel 4.18.0-372.26.1.el8_6.x86_64 then the machine boots and runs cleanly.

All the VMs are either CentOS7 or Rocky8, all fully patched.

It’s interesting that the kernel has el8_6 suffix, but I thought 8.7 had been released?

8.7 hasn’t been released yet. RHEL yes, a day or two ago. Rocky is in progress.

1 Like

Indeed. RHEL 8.7 was released … Wikipedia says “Monday”, someone saw files on Tuesday, Release Notes appeared Wednesday. (With initial kernel version 4.18.0-425.*.el8.) Rocky is still building?

Kernel panics when (libvirt) services start. That means that the initial “init in initramfs” phase is already over and flaws there should not be a root cause?

Nevertheless, two obvious things to do are (1) reinstall the 372.32.1, and (2) boot with VM autostarts disabled. If you can boot without VMs, but panic when manually starting them, then there is one thing to look at. If panic is still there even without VMs, then they might be innocent.

Yeah. I think I’ll wait until 8.7 is released, and try that. Just in case there was an unknown race condition in the 372.32.1 kernel that has been “fixed” in the newer one (or changes elsewhere avoid it).

If that also panics then I’ll try some of the things you’ve written (e.g. don’t use autostart but have my own process to start things up more slowly; it’ll depend on the results!)

Well, 8.7 booted cleanly. Which either means there was an issue with the .32 kernel or there’s some other issue hiding somewhere and timings changed just enough to avoid it :slight_smile: