Virtual Machine Data Loss and Corruption on host reboot

joemuller · December 6, 2024, 12:09am

Hi, I have a system running Rocky 9.5 (x86_64) which has 2 virtual machines (kvm-qemu). If I issue the ‘reboot’ command on the host - after the system is reachable by SSH again, I find that the virtual machines will show as running using ‘virsh’, but when checked with virt-viewer, are repairing filesystem errors or suffer from database corruption.

The existing documentation from Redhat and the Libvirt project doesn’t say much about how hosts are suspended or shutdown when the virtualization host is shutdown. There used to be a libvirt-guests service that would issue the appropriate shutdown or managedsave commands to safely halt virtual machines, but that doesn’t seem to be enabled anymore in RHEL/Rocky 9.x.

What services should be running for libvirt and qemu on a fresh system? I tried following the latest Redhat Virtualization Guide, but I end up with an entirely different set of daemons (virt*) instead of the modular (libvirt*) ones that used to be the defaults for RHEL/Rocky 8.

Let me know if there are any log files or package lists that will help diagnose the problem.

Thanks.

Joe M

gerry666uk · December 6, 2024, 7:22pm

Can you clarify the exact name of these daemons on a) Rocky 8.x and b) Rocky 9.x

It’s a good question about rebooting a host when guest vms are running. Even if there was a command to stop all vms, it’s might not work, e.g. if a service is running on the guest that can’t instantly stop.

joemuller · December 6, 2024, 11:30pm

Per previous experience and the RHEL 8 virtualization guide, the only required services on Rocky 8 should be:

libvirtd.service

For RHEL 9 (and presumably Rocky 9), there are many more daemons, so much so that they list them via a bash script, but I’ll expand the output here for convenience: (confirmed this on a fresh system)

virtqemud.socket
virtqemud-ro.socket
virtqemud-admin.socket
virtnetworkd.socket
virtnetworkd-ro.socket
virtnetworkd-admin.socket
virtnodedevd.socket
virtnodedevd-ro.socket
virtnodedevd-admin.socket
virtnwfilterd.socket
virtnwfilterd-ro.socket
virtnwfilterd-admin.socket
virtsecretd.socket
virtsecretd-ro.socket
virtsecretd-admin.socket
virtstoraged.socket
virtstoraged-ro.socket
virtstoraged-admin.socket
virtinterfaced.socket
virtinterfaced-ro.socket
virtinterfaced-admin.socket

Now obviously these do not say ‘libvirt’ anymore, but I have a production host running 9.5 that has these running:

libvirtd-ro.socket
libvirtd.service
libvirtd.socket
virtinterfaced.socket
virtlockd.socket
virtlogd.socket
virtnetworkd.socket
virtnodedevd.socket
virtnwfilterd.socket
virtproxyd.socket
virtqemud-admin.socket
virtqemud-ro.socket
virtqemud.service
virtqemud.socket
virtsecretd.socket
virtstoraged.socket

So apparently you can enable both the old libvirt* services and virt* ones at the same time, and that might be causing part of the problem…

gerry666uk · December 7, 2024, 2:59pm

So in the original post:

where it says “modular”, looked back-to-front to me. I’m calling the old ones “monalithic”, and the new ones “modular”.

On my 9.5, I see the same list as you, mostly virt*, and I also see the old libvirtd*, but the old ones are disabled, and have the title ‘legacy monolithic daemon’.

joemuller · December 10, 2024, 1:11am

@gerry666uk yes, looks like I switched up those terms in this case. Per the Libvirt Documentation either setup is valid, but for completeness sake, I’ll put the corrected services below.

Monolithic = single libvirtd.service
Modular = virt*.service and virt*.socket

Monolithic vs modular daemons
Traditionally libvirt provided a single monolithic daemon called libvirtd which exposed support for
all the stateful drivers, both primary hypervisor drivers and secondary supporting drivers. It also
enables secure remote access from clients running off host.

Work is underway for the monolithic daemon to be replaced by a new set of modular daemons
virt${DRIVER}d, each one servicing a single stateful driver. A further virtproxyd daemon will
provide secure remote access, as well as backcompatibility for clients using the UNIX socket
path of the monolithic daemon.

gerry666uk · December 10, 2024, 8:49pm

OK, so do you confirm you only have the new modular ones enabled?

Maybe you need to clean shut down the guests before rebooting the host.

But this must be an issue even with big hosting, I mean what happens if you reboot vmware while 1000 guests are all running and doing different things?

tjdoyle · December 10, 2024, 10:45pm

Hi,

If high availability is configured correctly, you put the esx host into maintenance and the VMs are automatically moved to other esx hosts in the cluster, then you can reboot or shutdown the host.

Regards Tom.

joemuller · December 10, 2024, 11:01pm

Yes, I have the modular services running (except one host, which I will be converting over shortly). I need to dig deeper into virtqemud and qemu to see what their default behavior is for shutdown/reboot. There’s also the possibility of qemu doing its own disk caching that could cause some writes to get thrown away.

system · February 8, 2025, 11:02pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.