Good evening everyone.
I recently upgraded my system with a second 3090, and Rocky is not able to boot after the GRUB loader, I can only see the mobo’s logo stuck in there forever.
My system was working nicely before this upgrade, I had my NVidia drivers perfectly working the only difference now is that I have a second GPU.
No idea what to do, any thoughts?
Thanks in advance.
Only the UEFI splash image? Not even GRUB menu?
- Can you get into UEFI?
- Can you ssh into the machine (even though it shows “nothing”)?
- Does the system “work” with just the new card (i.e. remove the old, good card)?
- Have you tried disabling “Fast boot”, etc settings in UEFI to make it as verbose as possible (i.e. no splash)?
- Do you have displays on both cards?
- You can remove second card and change default:
systemctl set-default multi-user.target . Does that behave any differently?
Hi @jlehtone thanks for your quick response,
I can get into the GRUB command line if I press “c” on the GRUB list, but not sure what to do inside here.
Not sure how to do ssh, if you can please explain? Im a new user
The system works with just one card, I have tested just using 1x gpu and swapping between both old and new and Rocky works fine with just 1, is just when it comes to using that second PCIe slot it doesnt boot at all and is stuck in the mobo’s logo. Also I have tested the system with both GPUs connected on Windows it works, so the Mobo and both cards work perfectly fine.
“Fast Boot” has always been disabled, and I believe settings there are as verbose as possible.
I have 2 displays connected to only the first card, second card has no output cables attached to it.
Will try this
systemctl set-default multi-user.target, to see if I boot into the no GUI mode, In case Im able to boot like this, what should I look for? Shall I enable/disable something if I get this far?
Thanks again for all your help
ssh is a tool to connect to machine over network, so it requires a second machine where you run ssh. Even Windows has ssh these days. If one can connect with ssh, then one has (CLI) bash session in the remote machine.
Likewise, it is (local) CLI bash session that one can start if machine boots to multi-user.target proper.
In both one can look at files in directory
boot.log* as those might have error messages that could hint what went wrong.
What if the system directs output to “wrong” card? At least as a test one could have display in both.
I managed to boot with no GUI using the
sudo systemctl set-default multi-user.target using both GPUs. But if I switch back to
sudo systemctl set-default graphical.target it does the same as before and it doesn’t fully boot.
I also checked the driver version installed and it seems that is the latest from the NVidia server 545.x.x Also I tested again and I can fully boot Rocky using just a single 3090 card in the first PCIe slot, then swapped cards and connected the second 3090 to the first slot and Rocky also boots. So both cards seem to be readable by Rocky and they seem to work independently. The problem comes when combining the two of them.
Also tested in graphical mode to swap cables between different ports and combining with different cards and it doesnt make any difference unfortunatelly.
When you have booted to multi-user.target, you can switch to, i.e. start the GUI, with:
sudo systemctl isolate graphical.target
That will probably have the same effect as booting to graphical, but you at least know that the system is up.
One can usually switch to text consoles with Ctrl-Alt-F3 (or F4, F5, F6) and back to GUI session with Ctrl-Alt-F1 or Ctrl-Alt-F2. You can test that when system has one card (to see that you can), and then with two cards, if text consoles work despite GUI being stalled.
If you can do the last bit, then you have a (text) view to the “live” system that lacks graphics. That may show a bit more than mere logs.
One can also run:
sudo systemctl isolate graphical.target ; sleep 5m ; sudo systemctl isolate multi-user.target
That should switch to graphical, wait 5 minutes, and then switch back to text mode.
Even if GUI fails, you should regain access to system after the wait.