Latest update RockyLinux v8.5 causes network issue?

My Rocky Linux v8.5 system just now complained as follows:

We're sorry, it looks like BOOT_IMAGE=(hd0,gpt5)/vmlinuz-4.18.0-348.20.1.el8_5.x86_64 crashed. Please contact the developer if you want to report the issue. 

This is shortly after a reboot. I rebooted because I lost all network connectivity from the machine.

Posting a topic on this board is the closing thing to “contact the developer if you to report the issue” that I have.

I invite guidance about what, if anything, I should do about this.

More details follow.

The current response to hostnamectl on the system are as follows:

[root@tms-desktop ~]# hostnamectl
   Static hostname: tms-desktop
         Icon name: computer-desktop
           Chassis: desktop
        Machine ID: 8c46f0db9ab242eea5f3c0f27b348cff
           Boot ID: b5b826ec9162419a93c5da8b3e4593cf
  Operating System: Rocky Linux 8.5 (Green Obsidian)
       CPE OS Name: cpe:/o:rocky:rocky:8:GA
            Kernel: Linux 4.18.0-348.20.1.el8_5.x86_64
      Architecture: x86-64

After the usual notification yesterday (15-Mar-2022), I installed updates. Here’s the session:

[root@tms-desktop ~]# dnf update
Last metadata expiration check: 0:29:59 ago on Tue 15 Mar 2022 04:53:45 PM EDT.
Dependencies resolved.
=================================================================================================================
 Package                        Architecture     Version                           Repository               Size
=================================================================================================================
Installing:
 kernel                         x86_64           4.18.0-348.20.1.el8_5             baseos                  7.0 M
 kernel-core                    x86_64           4.18.0-348.20.1.el8_5             baseos                   38 M
 kernel-devel                   x86_64           4.18.0-348.20.1.el8_5             baseos                   20 M
 kernel-modules                 x86_64           4.18.0-348.20.1.el8_5             baseos                   30 M
 kernel-modules-extra           x86_64           4.18.0-348.20.1.el8_5             baseos                  7.7 M
Upgrading:
 bpftool                        x86_64           4.18.0-348.20.1.el8_5             baseos                  7.7 M
 firefox                        x86_64           91.7.0-3.el8_5                    appstream               106 M
 google-chrome-stable           x86_64           99.0.4844.74-1                    google-chrome            79 M
 kernel-headers                 x86_64           4.18.0-348.20.1.el8_5             baseos                  8.3 M
 kernel-tools                   x86_64           4.18.0-348.20.1.el8_5             baseos                  7.2 M
 kernel-tools-libs              x86_64           4.18.0-348.20.1.el8_5             baseos                  7.0 M
 python3-perf                   x86_64           4.18.0-348.20.1.el8_5             baseos                  7.1 M
Removing:
 kernel                         x86_64           4.18.0-348.2.1.el8_5              @baseos                   0  
 kernel-core                    x86_64           4.18.0-348.2.1.el8_5              @baseos                  68 M
 kernel-devel                   x86_64           4.18.0-348.2.1.el8_5              @baseos                  49 M
 kernel-modules                 x86_64           4.18.0-348.2.1.el8_5              @baseos                  22 M
 kernel-modules-extra           x86_64           4.18.0-348.2.1.el8_5              @baseos                 667 k

Transaction Summary
=================================================================================================================
Install  5 Packages
Upgrade  7 Packages
Remove   5 Packages

Total download size: 324 M
Is this ok [y/N]: y 
Downloading Packages:
(1/12): kernel-4.18.0-348.20.1.el8_5.x86_64.rpm                                  7.6 MB/s | 7.0 MB     00:00    
(2/12): kernel-devel-4.18.0-348.20.1.el8_5.x86_64.rpm                             16 MB/s |  20 MB     00:01    
(3/12): kernel-modules-extra-4.18.0-348.20.1.el8_5.x86_64.rpm                     22 MB/s | 7.7 MB     00:00    
(4/12): kernel-modules-4.18.0-348.20.1.el8_5.x86_64.rpm                           26 MB/s |  30 MB     00:01    
(5/12): bpftool-4.18.0-348.20.1.el8_5.x86_64.rpm                                  27 MB/s | 7.7 MB     00:00    
(6/12): kernel-headers-4.18.0-348.20.1.el8_5.x86_64.rpm                           29 MB/s | 8.3 MB     00:00    
(7/12): kernel-core-4.18.0-348.20.1.el8_5.x86_64.rpm                              13 MB/s |  38 MB     00:02    
(8/12): kernel-tools-4.18.0-348.20.1.el8_5.x86_64.rpm                             21 MB/s | 7.2 MB     00:00    
(9/12): python3-perf-4.18.0-348.20.1.el8_5.x86_64.rpm                             26 MB/s | 7.1 MB     00:00    
(10/12): kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64.rpm                       16 MB/s | 7.0 MB     00:00    
(11/12): google-chrome-stable-99.0.4844.74-1.x86_64.rpm                           54 MB/s |  79 MB     00:01    
(12/12): firefox-91.7.0-3.el8_5.x86_64.rpm                                        24 MB/s | 106 MB     00:04    
-----------------------------------------------------------------------------------------------------------------
Total                                                                             52 MB/s | 324 MB     00:06     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                         1/1 
  Running scriptlet: kernel-core-4.18.0-348.20.1.el8_5.x86_64                                                1/1 
  Installing       : kernel-core-4.18.0-348.20.1.el8_5.x86_64                                               1/24 
  Running scriptlet: kernel-core-4.18.0-348.20.1.el8_5.x86_64                                               1/24 
  Installing       : kernel-modules-4.18.0-348.20.1.el8_5.x86_64                                            2/24 
  Running scriptlet: kernel-modules-4.18.0-348.20.1.el8_5.x86_64                                            2/24 
  Upgrading        : kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64                                         3/24 
  Running scriptlet: kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64                                         3/24 
  Upgrading        : kernel-tools-4.18.0-348.20.1.el8_5.x86_64                                              4/24 
  Installing       : kernel-4.18.0-348.20.1.el8_5.x86_64                                                    5/24 
  Installing       : kernel-modules-extra-4.18.0-348.20.1.el8_5.x86_64                                      6/24 
  Running scriptlet: kernel-modules-extra-4.18.0-348.20.1.el8_5.x86_64                                      6/24 
  Running scriptlet: google-chrome-stable-99.0.4844.74-1.x86_64                                             7/24 
  Upgrading        : google-chrome-stable-99.0.4844.74-1.x86_64                                             7/24 
  Running scriptlet: google-chrome-stable-99.0.4844.74-1.x86_64                                             7/24 
  Upgrading        : python3-perf-4.18.0-348.20.1.el8_5.x86_64                                              8/24 
  Upgrading        : kernel-headers-4.18.0-348.20.1.el8_5.x86_64                                            9/24 
  Upgrading        : bpftool-4.18.0-348.20.1.el8_5.x86_64                                                  10/24 
  Upgrading        : firefox-91.7.0-3.el8_5.x86_64                                                         11/24 
  Running scriptlet: firefox-91.7.0-3.el8_5.x86_64                                                         11/24 
  Installing       : kernel-devel-4.18.0-348.20.1.el8_5.x86_64                                             12/24 
  Running scriptlet: kernel-devel-4.18.0-348.20.1.el8_5.x86_64                                             12/24 
  Erasing          : kernel-4.18.0-348.2.1.el8_5.x86_64                                                    13/24 
  Running scriptlet: kernel-4.18.0-348.2.1.el8_5.x86_64                                                    13/24 
  Erasing          : kernel-devel-4.18.0-348.2.1.el8_5.x86_64                                              14/24 
  Running scriptlet: google-chrome-stable-99.0.4844.51-1.x86_64                                            15/24 
  Cleanup          : google-chrome-stable-99.0.4844.51-1.x86_64                                            15/24 
  Running scriptlet: google-chrome-stable-99.0.4844.51-1.x86_64                                            15/24 
  Cleanup          : kernel-headers-4.18.0-348.12.2.el8_5.x86_64                                           16/24 
  Running scriptlet: firefox-91.6.0-1.el8_5.x86_64                                                         17/24 
  Cleanup          : firefox-91.6.0-1.el8_5.x86_64                                                         17/24 
  Running scriptlet: firefox-91.6.0-1.el8_5.x86_64                                                         17/24 
  Erasing          : kernel-modules-extra-4.18.0-348.2.1.el8_5.x86_64                                      18/24 
  Running scriptlet: kernel-modules-extra-4.18.0-348.2.1.el8_5.x86_64                                      18/24 
  Cleanup          : kernel-tools-4.18.0-348.12.2.el8_5.x86_64                                             19/24 
  Erasing          : kernel-modules-4.18.0-348.2.1.el8_5.x86_64                                            20/24 
  Running scriptlet: kernel-modules-4.18.0-348.2.1.el8_5.x86_64                                            20/24 
  Running scriptlet: kernel-core-4.18.0-348.2.1.el8_5.x86_64                                               21/24 
  Erasing          : kernel-core-4.18.0-348.2.1.el8_5.x86_64                                               21/24 
warning: file /lib/modules/4.18.0-348.2.1.el8_5.x86_64/updates: remove failed: No such file or directory

  Running scriptlet: kernel-core-4.18.0-348.2.1.el8_5.x86_64                                               21/24 
  Cleanup          : kernel-tools-libs-4.18.0-348.12.2.el8_5.x86_64                                        22/24 
  Running scriptlet: kernel-tools-libs-4.18.0-348.12.2.el8_5.x86_64                                        22/24 
  Cleanup          : python3-perf-4.18.0-348.12.2.el8_5.x86_64                                             23/24 
  Cleanup          : bpftool-4.18.0-348.12.2.el8_5.x86_64                                                  24/24 
  Running scriptlet: kernel-core-4.18.0-348.20.1.el8_5.x86_64                                              24/24 
dkms: running auto installation service for kernel 4.18.0-348.20.1.el8_5.x86_64
 Done. 

  Running scriptlet: firefox-91.7.0-3.el8_5.x86_64                                                         24/24 
  Running scriptlet: bpftool-4.18.0-348.12.2.el8_5.x86_64                                                  24/24 
  Verifying        : kernel-4.18.0-348.20.1.el8_5.x86_64                                                    1/24 
  Verifying        : kernel-core-4.18.0-348.20.1.el8_5.x86_64                                               2/24 
  Verifying        : kernel-devel-4.18.0-348.20.1.el8_5.x86_64                                              3/24 
  Verifying        : kernel-modules-4.18.0-348.20.1.el8_5.x86_64                                            4/24 
  Verifying        : kernel-modules-extra-4.18.0-348.20.1.el8_5.x86_64                                      5/24 
  Verifying        : firefox-91.7.0-3.el8_5.x86_64                                                          6/24 
  Verifying        : firefox-91.6.0-1.el8_5.x86_64                                                          7/24 
  Verifying        : bpftool-4.18.0-348.20.1.el8_5.x86_64                                                   8/24 
  Verifying        : bpftool-4.18.0-348.12.2.el8_5.x86_64                                                   9/24 
  Verifying        : kernel-headers-4.18.0-348.20.1.el8_5.x86_64                                           10/24 
  Verifying        : kernel-headers-4.18.0-348.12.2.el8_5.x86_64                                           11/24 
  Verifying        : kernel-tools-4.18.0-348.20.1.el8_5.x86_64                                             12/24 
  Verifying        : kernel-tools-4.18.0-348.12.2.el8_5.x86_64                                             13/24 
  Verifying        : kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64                                        14/24 
  Verifying        : kernel-tools-libs-4.18.0-348.12.2.el8_5.x86_64                                        15/24 
  Verifying        : python3-perf-4.18.0-348.20.1.el8_5.x86_64                                             16/24 
  Verifying        : python3-perf-4.18.0-348.12.2.el8_5.x86_64                                             17/24 
  Verifying        : google-chrome-stable-99.0.4844.74-1.x86_64                                            18/24 
  Verifying        : google-chrome-stable-99.0.4844.51-1.x86_64                                            19/24 
  Verifying        : kernel-4.18.0-348.2.1.el8_5.x86_64                                                    20/24 
  Verifying        : kernel-core-4.18.0-348.2.1.el8_5.x86_64                                               21/24 
  Verifying        : kernel-devel-4.18.0-348.2.1.el8_5.x86_64                                              22/24 
  Verifying        : kernel-modules-4.18.0-348.2.1.el8_5.x86_64                                            23/24 
  Verifying        : kernel-modules-extra-4.18.0-348.2.1.el8_5.x86_64                                      24/24 
Installed products updated.

Upgraded:
  bpftool-4.18.0-348.20.1.el8_5.x86_64                  firefox-91.7.0-3.el8_5.x86_64                            
  google-chrome-stable-99.0.4844.74-1.x86_64            kernel-headers-4.18.0-348.20.1.el8_5.x86_64              
  kernel-tools-4.18.0-348.20.1.el8_5.x86_64             kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64           
  python3-perf-4.18.0-348.20.1.el8_5.x86_64            
Installed:
  kernel-4.18.0-348.20.1.el8_5.x86_64                        kernel-core-4.18.0-348.20.1.el8_5.x86_64            
  kernel-devel-4.18.0-348.20.1.el8_5.x86_64                  kernel-modules-4.18.0-348.20.1.el8_5.x86_64         
  kernel-modules-extra-4.18.0-348.20.1.el8_5.x86_64         
Removed:
  kernel-4.18.0-348.2.1.el8_5.x86_64                         kernel-core-4.18.0-348.2.1.el8_5.x86_64             
  kernel-devel-4.18.0-348.2.1.el8_5.x86_64                   kernel-modules-4.18.0-348.2.1.el8_5.x86_64          
  kernel-modules-extra-4.18.0-348.2.1.el8_5.x86_64          

Complete!

Hi,

Does the network issue occur, if you boot in to the previous kernel?

Thanks Tom.

I don’t know, I saw it just once.

It apparently does not repeat even when I boot this kernel.

Meanwhile, a new batch of updates hit today, so the world has changed anyway.

So all is good now?

Regards Tom.

I think so – one never knows, though.

I’m paying attention to this because I’ve seen some issues with specific SSH connections and tunnels to remote systems, where ports that the remote end expects to use are not present on the local system.

Although probably unrelated, I’m leery of any recent changes that might break seldom-used aspects of the many layers of network protocols.

For now, I’m just continuing to watch.

1 Like

The first error looks serious, where did you see this, e.g. “on screen”?
Try to find the /var/log/messages from the exact time of the reboot up to the point where it crashed, and see if the crash is recorded in the log.

As a general rule, never reboot a machine when something goes wrong. Doing so means that you lose all the information about what’s wrong. e.g. find out why the network stopped working first, and then only reboot once you’ve found the answer and backed up the relevant log entries.

The complaint I posted in the thread-starter was the content of a notifier that popped up in the lower right corner of the KDE Plasma desktop.

I have defined /var/log/journal and journalctl reports its usual lengthy content.

As a general rule, never reboot a machine when something goes wrong. Doing so means that you lose all the information about what’s wrong. e.g. find out why the network stopped working first, and then only reboot once you’ve found the answer and backed up the relevant log entries.

Understood. Nevertheless, this the platform on which I do ALL of my work every day. I assume, perhaps rashly, that journalctl keeps enough to sort out what happened if it ever becomes needed.

I can’t afford to lose hours or days chasing an obscure and intermittent issue like this. If it happens frequently, maybe so.

I offer questions/topics like this in hopes that someone who does spend their days deeply engaged with all this will recognize something if it’s a serious issue.

By default, on Rocky 8.5, I don’t think journalctl will show anything before the current boot, but /var/log/messages does.

The original noticiation from KDE

Sounds like a kernel crash of some kind. The kernel version is the latest right now for Rocky 8.5.
Documentation:

Try this command as root to see if kdump is running
systemctl status kdump
Check /var/crash to see if there are any files in there.

When /var/log/journal is present, journalctl retains persistent information across reboots.

In my case, the logging begins on 17-Oct-2021:

# journalctl | more
-- Logs begin at Sun 2021-10-17 03:29:43 EDT, end at Fri 2022-03-18 09:11:07 EDT. --
Oct 17 03:29:43 localhost.localdomain kernel: Linux version 4.18.0-305.19.1.el8_4.x86_64 (mockbuild@ord1-prod-x86
build002.svc.aws.rockylinux.org) (gcc version 8.4.1 20200928 (Red Hat 8.4.1-1) (GCC)) #1 SMP Wed Sep 15 19:12:32 
UTC 2021
Oct 17 03:29:43 localhost.localdomain kernel: Command line: BOOT_IMAGE=(hd2,gpt5)/vmlinuz-4.18.0-305.19.1.el8_4.x
86_64 root=/dev/mapper/rl-root ro crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap
 rhgb quiet
Oct 17 03:29:43 localhost.localdomain kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point regist
ers'
Oct 17 03:29:43 localhost.localdomain kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Oct 17 03:29:43 localhost.localdomain kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

status:

[root@tms-desktop ~]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Thu 2022-03-17 09:23:55 EDT; 23h ago
 Main PID: 1979 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 203084)
   Memory: 0B
   CGroup: /system.slice/kdump.service

Mar 17 09:23:55 tms-desktop systemd[1]: Starting Crash recovery kernel arming...
Mar 17 09:23:55 tms-desktop kdumpctl[1979]: kdump: kexec: loaded kdump kernel
Mar 17 09:23:55 tms-desktop kdumpctl[1979]: kdump: Starting kdump: [OK]
Mar 17 09:23:55 tms-desktop systemd[1]: Started Crash recovery kernel arming.

Crash files:

[root@tms-desktop ~]# ls -l /var/crash
total 0

Regarding the journalctl logging, it sounds like you have it configured to persist across reboots, which is good. In that case, you should be able to find a small chunk from the exact time the network stopped working, and another small chunk from the exact time of the kernel crash. You have ‘kdump’ running, but there are no crash files. I’ve seen this before, not sure why there are no crash files. It can happen if there’s not enough disk space for example.

If it’s a kernel crash, it could be specific to your machine, e.g. hardware glitch or graphics driver glitch where it only happens at random intervals.

Please prepare the system for kdump (usually crashkernel=auto works) and test it.

When the issue happens again either execute echo 1 > /proc/sys/kernel/sysrq ; echo s > /proc/sysrq-trigger ; echo c > /proc/sysrq-trigger

or use the keyboard combination Alt + SysRq + s to sync all File Systems and then Alt + SysRq + c to force the crash.
Note: sometimes SysRq button is the print screen. In order to test if it is actually that key use the combo with the ‘m’ key as it’s printing the memory usage (journalctl -k ; dmesg -T for verification).

1 Like

This looks interesting. At the moment, I’m stalled at the initial prepare-and-test for kdump step. I seem to have entered a mystery – where, in the spirit of the late Peter Gomes, “a mystery is something that we enter. Upon entering, a mystery opens itself to reveal other mysteries.”

According to my /etc/default/grub, I already have crashdump set to auto. According to /etc/kdump.conf, the dump files should go in /var/crash. There is plenty of room in that file system (more than 50 GB).

Yet when I trigger a crash from the command line (I love that little fragment!) and cause a crash, no file is created in /var/crash.

I therefore apparently need to sort out why kdump is apparently not working on this system.

I appreciate your guidance, and I’ll update this topic once I get kdump working.

I do like the idea of having a mechanism to at least see what’s happened if the network issue recurs. So far, that hasn’t happened.

So, I’m now debugging why I had issues with two RL8 installs. My main machine is now back, and that was something associated with the nvidia drivers not getting the modules updated with the new kernel version correctly. I had to remove and reinstall the NVIDIA drivers to get that working. That update I was watching as it went forward, and it started with 105 packages, and kept having to re-try rpm files, specifically firefox, thunderbird and google-chrome as the checksums were not matching.

The second machine, however, I updated in a way I now regret, but I wasn’t at that point thinking anything would go south. I ssh’d into the second machine, sudo’d, and did a

  dnf update

The terminal window became unresponsive as it seemed to be close to finishing the update, and now it’s dead with the system going into “emergency mode”, where it asks for root password, then I can fix things and reboot. That machine has lost it’s network connections, though it does see (from ifconfig) the loopback device.

I had thought the issue was from my main machine not being alive, [dead machine does an nfs mount from the one that had issues with the kernel update], but after I’d resolved the issues with my main machine, I found the more central “non-functional network” was the real issue. Both machines have been working great up to this point in time. The dead machine is old, but it’s been running Rocky for the longest, and its been, up to this point the most reliable. It is certainly not out of the question that I’ve lost hardware on it, and the two machines going belly up nearly simultaneously is simply coincidence.

Currently, that machine is textmode only (runlevel 3 in my aged parlance). I’m considering re-installation on it, as w/o network, I can’t read google/etc. to aid my debugging. [Note: runlevel actually comes back with “unknown”, not 3]

This was (for both machines) the kernel update to kernel-4.18.0-348.20.1.el8_5.x86_64, and on the dead machine, going back a kernel level doesn’t fix things either (at least one level). I’m about to head back downstairs to sit at the dead machine and try some more before letting things go for the nonce, but I did wonder if there was some software issues, as the machine was literally working perfectly right up to the ill advised remote “dnf update”.

I do appreciate the quick primer on setting up kdump for RL8. I have another RHEL7 system that I’ve had to set that up for, not yet on RL8.

I’m adding this in case others are finding the same issue with the network, but for the obvious reasons having a problem figuring this out without a functional network. If I do find that my issue is actually dead hardware, I’ll post that here.

Cheers,
Mike

Sometimes crashkernel=auto doesn’t reserve enough memory. For example if you have a lot of usb devices or multipath devices, you will need more memory reserved.
Real clue is if you watch physical terminal/serial console while the kdump is ‘taking over’ as it will show a short and very fast OOM message and reboot.

Also, kdump refuses to write anything if target has too few space and it won’t hold the whole crashdump.

And, what I had, while very weird, may be completely unrelated.

I found the underlying issue, which was another mount, this one to mount the /boot/efi partition. I did a parted, and the partition seemed to exist. I did a fsck.vfat on it, and it came back with a clean bill of health, but when I tried to mount it I got back an error message saying that “vfat” was an unknown filesystem type.

I got around this by commenting out the /boot/efi line in /etc/fstab, and then the system would reboot into the previous kernel level. Network was just fine, had to re-install the nvidia junk, as I’d uninstalled that in my machinations to figure out what was happening, and surprise, now I was able to mount the /boot/efi partition again. (the system had re-gained its memory of what a vfat filesystem was…). I fixed the /etc/fstab, so it would mount /boot/efi as always, and then re-installed all the kernel-*.rpm files that were associated with the kernel-4.18.0-348.20.1.el8_5.x86_64 version. Went without a hitch.

Rebooted into the newest kernel, and all is well, at least as far as I can tell.

So I’m guessing this is unrelated, but this was more than a little strange. I’m also wondering if I should have done a “clean” step, as it just used the already downloaded rpm’s but I’m assuming those checksums were checked before all this went south.

Both of these updates were done March 19, at around noon Pacific time. This may well just be my local ISP having issues during this time, but wanted to get this out here in case others had issues also.

Bonus for me: much more systemctl literate than I was two days ago. Now I can find out which modules are failing:

  systemctl --state=failed

which was how I saw that the boot.efi partition was a problem. Did not require a full reinstall, but really weird. Why, running the previous kernel, could it not know how to mount a vfat file system, then figure that back out? Obviously, the machine had not come all the way to any normal state, but very, very weird.

Thanks much,
Mike

It seems that solved it, but just in case. Sometimes fsck.vfat doesn’t remove the dirty flag - so I always repair with the force option.