Can install 9.0 but not 9.1 from USB drive

I am looking for help to resolve a persistent “not syncing” panic stop while trying to install Rocky Linux 9.1 but not when I install 9.0 under the exact same circumstances.

Context

The machine is currently running 9.1 (upgraded from 9.0) without any obvious issues but I’m now planning a fresh re-install of 9.1 in order to put it into service as a KVM/QEMU host.

Hardware

  • M/B: ASUS Prime X570-P
  • CPU: Ryzen 5 5600X 4th Gen 6-core
  • RAM: Crucial Ballistix 3600 MHz DDR4 DRAM (2x16GB)
  • SSD: SAMSUNG 980 PRO 1TB PCIe NVMe Gen4
  • VGA: ATI AMD FirePro 2270 512MB PCI-E

UEFI BIOS Settings

  • “Factory” settings
  • Secure boot disabled
  • Never overclocked; RAM running at 2666MHz

Things I have tried

  • Using different USB drives.
  • Removing the two 16GB RAM sticks and using only one at a time.
  • Removing all cards except for the video card used for the console.

I have no experience with debugging kernel panic stops and have had no luck searching this forum nor googling for answers.

Kernel Panic

Based on the panic error below, I suspect a video driver issue even though the video card works fine for the console when I boot the existing Rocky 9.1 installation.

Can you try adding nomodeset to the kernel line before booting the installer? Perhaps that will help.

Thanks. Unfortunately adding nomodeset and continuing with Ctl+x resulted in what appears to be the exact same panic message

Maybe inst.xdriver=vesa

Thanks. Unfortunately the problem persists.

Can you clarify BIOS settings in both cases 9.0 and 9.1, disable CSM, secure boot, UEFI etc.

I’m interested in the AMD Firepro 2270, is that the passive cooling one?
I really liked that card, but ran into a problem with it where it didn’t support UEFI (without an unsupported firmware hack), and after looking more into the way it connected to the bus, I decided I’d have to replace it with a newer one.

Thanks for the suggestions and apologies for the late reply.

The quick reply is that the BIOS settings were identical in both cases and secure boot was disabled.

I will try your suggestion and report results tomorrow.

The default CSM settings are shown below.

I also tried setting all options to “UEFI Only” and “Legacy OPROM” with no success in either configuration.

Defaults:

Yes the 2270 has no fan. Notably, it worked fine for the console under 9.0.

Hi. I’m not getting anywhere with this problem. Please let me know if you can think of any other things to try. Thanks!

Maybe inst.text (or just text alone).

I’ve never tried either but it’s worth a shot.

Alternatively, since installing 9.0 works why not stop beating your head against the wall and just install that? Run a dnf upgrade afterward and you should be all set.

Can you borrow a graphics card or use onboard graphics to see if it makes any difference. I notice you have CSM enabled, and a number of legacy settings, which I think will become unsustainable as time goes on. I don’t know if you’ve applied UEFI firmware to the GOP, but it doesn’t say so. Be careful if you disable CSM, because it could cause a black screen (as in not able to see the BIOS at all).

In general, from RHEL 8.0 onwards, you are expected to use UEFI, GPT partitions and Secure Boot.

Thanks for the suggestion. inst.test and test did not solve the problem.

I noted earlier that I already have a running 9.1 installation upgraded from 9.0 on the box.

Since I’m planning to use this machine as a VM host for all my work, I want to make sure the hardware is good.

That said, I may resort to the 9.0 to 9.1 upgrade route if all else fails.

Ahem.

text != test

You might want to try that again.

My Ryzen processor does not have built-in graphics.

I do have a Quadro 620 but I have not been able to make it work as the console device at bootup.

I tried using UEFI setting for everything earlier without success.

I’m not sure what “applied UEFI firmware to the GOP” means.

Thanks!

Sorry I did try text not test :slight_smile:

Post mortem:

After a fresh 9.0 install and upgrade to 9.1, the the panic stop is still there. So it it seems like the video card which was supported in 9.0 is no longer supported in 9.1.

When people purchase a RHEL subscription, they expect the major release to run from start to end – a decade. If Red Hat would drop hardware support in a point update, then they would break that expectation. The “no longer supported” is thus unthinkable in Enterprise Linux.

An error in (Rocky) build or a regression (introduced by Red Hat) are much more likely explanations.

The challenge is in how to diagnose the root cause.

This is interesting. People were saying just install 9.0 and then upgrade to 9.1 and everything will be fine, but it didn’t make sense to me; it would imply that the boot process of 9.1 (after an upgrade) is different to the boot process from a boot device such as USB.

We don’t know it’s the video card for sure, but we need to rule it out.

I’m surprised this card works in 9.0 with “Compatibility Service Module” disabled, it seems impossible.

Did RH deliberately drop support between 9.0 and 9.1, maybe not, but did they do it accidently, maybe. The release notes are not as concise as they should be.

From the panic crash screenshot, I was able to narrow-down the location of the unexpected exception that causes the crash… It happens in the amdgpu driver in a C function named “amdgpu_device_fini_sw”.

Googling the name of the routine turned up some interesting recent changes to the driver.

The code follows. Note that the routine seems to take a pointer to the device data structure as an argument and it appears to be “resetting” the device.

I have not worked with gpu drivers so I don’t know how to debug this issue especially when it occurs at boot time :frowning:

void amdgpu_device_fini_sw(struct amdgpu_device *adev)
{
	int idx;

	amdgpu_fence_driver_sw_fini(adev);
	amdgpu_device_ip_fini(adev);
	release_firmware(adev->firmware.gpu_info_fw);
	adev->firmware.gpu_info_fw = NULL;
	adev->accel_working = false;
	dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));

	amdgpu_reset_fini(adev);

	/* free i2c buses */
	if (!amdgpu_device_has_dc_support(adev))
		amdgpu_i2c_fini(adev);

	if (amdgpu_emu_mode != 1)
		amdgpu_atombios_fini(adev);

	kfree(adev->bios);
	adev->bios = NULL;
	if (amdgpu_device_supports_px(adev_to_drm(adev))) {
		vga_switcheroo_unregister_client(adev->pdev);
		vga_switcheroo_fini_domain_pm_ops(adev->dev);
	}
	if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
		vga_client_unregister(adev->pdev);

	if (drm_dev_enter(adev_to_drm(adev), &idx)) {

		iounmap(adev->rmmio);
		adev->rmmio = NULL;
		amdgpu_device_doorbell_fini(adev);
		drm_dev_exit(idx);
	}

	if (IS_ENABLED(CONFIG_PERF_EVENTS))
		amdgpu_pmu_fini(adev);
	if (adev->mman.discovery_bin)
		amdgpu_discovery_fini(adev);

	amdgpu_reset_put_reset_domain(adev->reset_domain);
	adev->reset_domain = NULL;

	kfree(adev->pci_state);

}