Hi there,
I use the official Rocky Linux 8 on the Azure cloud platform to deploy VMs.
Publisher ID: resf
Product ID: rockylinux-x86_64
Plan ID: 8-base
It is a part of my auto-testing infrastructure for testing my application.
The tests deploy several VMs at a time using the Azure Resource Manager(ARM) template, after the VMs are deployed(provisioning state is successful), the scripts try to connect to the VMs via SSH, but sometimes one or two of the deployed VMs do not respond. The tests are trying to connect via SSH for about 7 minutes and fail.
I enabled the boot diagnostics on Azure to collect the serial console log and see if there were any errors, but it looks ok.
The only thing that bothers me is that I see the âNetwork Managerâ restarted after Cloud-init was completed.
[ 7104.638646] cloud-init[2498]: Cloud-init v. 23.1.1-9.el8.0.1 running 'modules:final' at Wed, 23 Oct 2024 19:58:14 +0000. Up 7101.02 seconds.
[ 7104.648989] cloud-init[2498]: Cloud-init v. 23.1.1-9.el8.0.1 finished at Wed, 23 Oct 2024 19:58:18 +0000. Datasource DataSourceAzure [seed=/dev/sr0]. Up 7104.62 seconds
[e[0;32m OK e[0m] Started Execute cloud user/final scripts.
[e[0;32m OK e[0m] Reached target Cloud-init target.
[ 7104.798778] fbcon: Taking over console
[ 7104.803288] Console: switching to colour frame buffer device 128x48
Rocky Linux 8.9 (Green Obsidian)
Kernel 4.18.0-513.5.1.el8_9.x86_64 on an x86_64
Activate the web console with: systemctl enable --now cockpit.socket
upg-1 login: [e[0;32m OK e[0m] Stopped Network Manager Wait Online.
Stopping Network Manager Wait Online...
Stopping Network Manager...
[e[0;32m OK e[0m] Stopped Network Manager.
Starting Network Manager...
[e[0;32m OK e[0m] Started Network Manager.
Starting Network Manager Wait Online...
[e[0;32m OK e[0m] Started Network Manager Wait Online.
[e[0;32m OK e[0m] Started Crash recovery kernel arming.
[e[0;1;31mFAILEDe[0m] Failed to start dnf makecache.
See 'systemctl status dnf-makecache.service' for details.
Once, I reproduced the problem manually (but with disabled boot diagnostics), then I tried to reboot the VM, and after the reboot, it started responding.
I should say that I use other images for the testing, but have problems only with Rocky.
Is it possible that both running Cloud-init and Azure Linux Agent simultaneously on the VM try to update network configuration and conflict or something?
I appreciate any advice.