Will give it a whirl!
This often happens when there is something trouble at the hardware.
Since this is an HP product, you can do some HW checks in iLO. are all device states Green in iLO?
The fact that it gets to different points indicates a defective hardware problem, and likely heat-releated (though could be power related as well). If it were a compatibility issue or a corrupt image then it would fail in the same way at the same point every time.
90% chance it is your power supply. Read this.
Todayās power supplies āget weakā years before they actually fail. As long as you arenāt stressing the system, they āseemā to work.
5% chance you have a RAM module failing. Get MemTest86 and make a bootable thumb drive. Do at least a 3 pass test with it. Keep scratch paper handy and every little bit write down how far it got. Donāt just sit there staring at it because it could take all night and part of the next few days depending on amount of RAM and how many passes.
Sometimes you get lucky with MemTest86 and it identifies the module without causing the reboot. The scratch paper and notes is for when you donāt. Thankfully, with todayās massive memory modules, you really only need to figure out the address space range for each module then keep track of which one it is working on now.
There is a 5% chance you are overheating. Can only have two links so cannot send you to the article.
I currently have an HP z820 that āwas working fineā when I had Windows 10 on it, but I blew the dust out of the case the other day, wiped the system, then installed OpenSuSE. BIOS throws up the infamous "517 - memory fan not found " error. I can see two of the three are still spinning, third canāt be seen. I canāt find a Linux utiilty (yet) that lets me see how fast they are spinning. At any rate, 3 new fans arrive Monday.
I took that detour because you could have a weak fan in your case/setup. It may not be making noise and it may be spinning, just not spinning like it is supposed to.
Thanks for the suggestions, Iāll have a play.
I followed an earlier suggestion and tried RL8. I get similar issues, it seems to get further through the Boot Live process but then just hangs and goes into the reboot loop. Screenshot attached in case that helps ā¦
Looking at your Photoās, there is something like ādisturbanceā on the screen. That usually indicates a problem with the GPU or GPU memory, maybe both are overheating. Do you have a spare Video card you can replace it with? If not, check the fan on the GPU (if it isnāt passively cooled), & make sure it is running smoothly. Remove the heatsinks from both the GPU & the GPUās memory, clean the chipsā surfaces & also that of the heatsinks very thoroughly, then add very small drops of fresh thermal transfer paste to the chips (if thermal pads were used, get fresh pads of the same thickness), then firmly reattach the heatsinks.
Get yourself a generic $15 (not NVIDIA) video card that can do VGA and not much more. Replace what you have and see if that helps. Heck, Iāve got a pile of video cards up on Ebid for $80
Kind of shocked when I went looking for you and saw people trying to sell that one Zotac card for $80 by itself on the Walmart site. There is one Radeon in there. Quick search doesnāt show anyone selling the old $15 ATI VGA only cards anymore. If you have an NVIDIA card in there it is possible the kernel has a problem with it. Most Linux distros do a wretched job testing with NVIDIA because that requires actual testing with actual hardware. Canāt be done with CI/CD.
Iām almost 90% certain you have a power supply problem. Order a brand new power supply for your machine and install it. If Iām wrong, it canāt hurt to have a spare.
Seeing your pictures took me back to MS DOS. Video cards existed in that wild wild west 384K above the DOS 640K limit. Lots of wretched utilities would try to LOADHI device drivers. They would probe the video card. When you werenāt running graphics it would report a very narrow memory useage range. Drivers would get stuff into the rest of the region. Then you would load something āgraphicalā like a game or even Lotus 123 and you would see garbage like this in front of the crash.
STUPID QUESTIONS
- Can you get into the BIOS?
- Does this particular BIOS allow you to change the amount of RAM reserved for your graphics card interface?
- Can you look up just how much your video card really needs?
- Adjust the value accordingly.
I have seen this problem, mostly in Oracle VirtualBox VMs where the video memory isnāt set correctly. Usually too big always works and too little does not.
Itās almost like your video card is mapped onto a region of memory that Linux thinks is memory it can use.
While this is a no money thing to check, itās also a low probability thing. Unless you did something like a BIOS update, whatever you had set that was working with Cent should be working here. You didnāt change video cards did you?
Hi,
Loadsa info there ā¦
Iāve been using this GPU for years, itās a twin output as I use this box (among other things) as my primary computer with a dual screen setup. The graphics card is fine, it works perfectly 100% of the time on CentOS 7. No issues ā¦ other than running short of RAM on occasion depending on what Iām doing. Itās an NVidia GeForce GT710.
I doubt that it is causing any issues with either the install process, or the issues with booting live as otherwise it runs perfectly well.
Will have a further think about your comments and have a play ā¦
Thanks,
Ripface
One last stupid question. Does your mobo have any (however crummy) built in video? If so, pull the card, hook a monitor to one of (or the only) port on the mobo and see if you get further.
Had a Lenovo SFF desktop I loved which had an NVS 310 video card in it for years. Running Windows 10 on that machine because a client mandated MS Office and Outlook . . . donāt ask.
One day it would only boot half way up, then power itself off. Sometimes it would try to restart after failing to start, but most times it would just shut down. When I pulled the video card and hooked to the onboard stuff all was well.
The power supply had gotten weak enough it could no longer run all of the RAM and the add-in video card.
Had it go the other way on a custom build. Video card was actually bad. Problem followed card between machines.
Assuming it is approaching evening where you are, I would at least pull down MemTest86 to a thumb drive and let it run on the box tonight.
If MemTest86 refuses to boot then one of these are the only possibilities:
- Very first RAM module is bad. Pull it and boot with less to be sure.
- Your power supply is looking up at feeble hoping to one day be that strong.
- Your video card picked a most inopportune time to partially fail.
Life gets complicated if MemTest86 runs N-passes clean. It doesnāt exercise any higher end video modes than basic VGA output sans graphics.
If you donāt want to ājust replace partsā (which is my general approach) then . . . find a Windows 10 license and installation media you can use. IF you manage to successfully install (and that is a big IF)
https://valuehub.hp.com/tools/hp-software-tools/hp-performance-advisor/
HP (you said it was HP) has performance and diagnostic software for their machines but it only runs under Windows. Find what is for your machine, install, run. It should identify any failing fans, or other hardware issues.
On the flip-side, if Windows 10 fails to install and dies in pretty much the same way, you need to start swapping out hardware, video card cheapest to swap but power supply most likely at this point. Still a small possibility that the BIOS setting for shared memory video is too small for how Operating Systems want to use the card now. Your older OS probably didnāt push into advanced modes requiring the larger shared memory region.
Just my 0.00002 cents.
New Microserver Gen10 arrived today so will be having a play with it. Hopefully once I get it up and running Iāll be able to do some more serious troubleshooting on the Gen8 and report back any findings.
One bummer is that the Gen10 seems to have a different 3.5" disk caddy design to the Gen8. I had hoped to move my Raid10 disk set across but looks like Iāll need new caddys. Googling this up I find lots of DL Gen10 caddys but nothing for Microserver Gen10 ā¦ and am not sure if they are the same design. Anyone know?
Cheers,
Ripface
As a general rule disk caddies are specific to a model. Itās like Apple changing the size of the charger connector so you have to purchase another $40 charger they make for $6. I know the plastic caddies for my HP z820 are being sold by HP dealers and ārefurbishedā sellers for about $50 each.
If you are speaking of an HP Microserver Gen 10 (please forgive my ignorance here) according to this link the 2.5 caddies are the same. Donāt know how much stock I would put in that.
According to this message thread, HP only sold the caddies with drives in them. Wow! Thatās a really long message thread. I didnāt scroll to the end to see if they actually found a source.
I guess you could always just buy HP drives with the caddies already on them. Imagine that will be nice and cheap . . . NOT!
IIRC, HP (or was it one of the other vendors?) at one point refused to sell empty caddies. Caddies came only with vendorās HDD ā no ābuy caddy and get HDD from much cheaper sourceā.
It was HP. The company famous for āstarter cartridgesā in their inkjet printers.
They donāt list caddies by themselves, at least at quick glance, but this place has some not bad prices on dries with caddies. Theyāve have a ālive chatā too though it is probably a bot. You really need to get a list of the possible part numbers. The $145 4TB drive doesnāt sound unreasonable.
According to everything Iāve read, you should have bought a GEN 9. Caddies and drives interchangeable with GEN 8.
Your only hope of buying ājust a caddieā is going to be grovelling to a server refurbishment place OR contacting each and every Computer/Electronics Recycling Center to buy a dead drive that still has the caddie.
Ok, so the Gen10 loaded up seamlessly with RL9. No problems, no issues, it just installed.
I canāt pull the Gen8 apart until I get some rails for the hard drives and move my Raid10 set across to the new platform. Iāve ordered up some generic 3.5" plastic rails, so Iām hoping I can do some butchery on them to get them to fit and slide into the Gen10 enclosure. Will let you know how I get on with that.
Typing this on my new Gen10, seems pretty damn good.
Cheers,
Ripface
P.S. Thanks to all for the help and suggestions, Iāll be sure to report back once everything is sorted!
Beginning to wish I had gone Gen9. The Gen10 is starting to piss me off a little as itās a bit of a lardy beast. It drives my network drive ok via Samba but as soon as I fire up Chrome it dies a death!! Iād expected it to be at least as good as the Gen8 but Iām not getting anything near itās performance. The CPU goes 100% as soon as I start up Chrome and the load average climbs until web pages are taking 20-30 seconds to load. Doesnāt seem to be memory hungry, Iāve got two 8GB UDIMMS and they are not maxed out. Of course the CPU is soldered to the MOBO so I guess Iām screwed unless there are any perf tweaks out there that will help.
: ((
Ummm,
Are you rack mount? If so, you might want to check into custom configure stuff here:
Search for NewServerLife Sorry, canāt put gen10 info here as Iām only allowed 2 links.
I know nothing about the place, just did a search. I did notice they allowed custom configuration. Didnāt see 3.5 inch caddies listed, but they seemed to have a ton of 2.5 inch.
Hereās a totally stupid question:
How close are you to Illinois and would an HP z820 āsolveā your problem?
I currently have it powered off. It started getting the infamous 517 memory fan error. I do have 3 brand new fans on order from China and am awaiting the slow boat to arrive. Others have been running with 517 for years claiming that if you donāt have every RAM slot full and you donāt run both CPU pegged you donāt have issues. It does have all the plastic drive cages and I have two spares.
Must warn you that it loads RAID controller firmware from burnt EPROM at boot as a security measure. I got rid of my RAID drives because I do full image backups each week and that was just too annoying.
Have been debating about parting with it. In process of setting up an i9-gen13 20-core Yocto build machine.
If you believe it is an option for you and we can come to terms off-list, I can clean it off and test install RL9.
You can then find a pigeon . . . .errr. . . . I mean customer for your Gen 10 or use it to hold down a loose piece of carpet
Just throwing this out there for you because you have been struggling for a while now.
I donāt run Web services on the thing. I do Yocto builds in very beefed up VMs. With the crummy 25Mbps line of site Internet I used to have it could complete a from scratch Yocto build for Toradex ARM hardware in under 3.5 hours. The biggest, baddest, i7-gen4 machines loaded to the hilt generally take over 25 hours.
Just trying to give you horsepower comparison if that means anything to you.
ā¦ erm, thanks for the offer!
Iām a bit away from Illinois ā¦ a few thousand miles. Alford, Scotland!
Rack mount? Itās a HP Microserver Gen10, so it sits on the carpet in my study.
Iāll have a butchers at the HP Z820 ā¦ might take a punt if it floats my boat.
Cheers,
Ripface
Ok, Update!
I removed the ASUS graphics card and tried installing using the VGA connector. For some reason the damn thing would not talk to my HP E241L monitors claiming it could not handle the refresh rate. Eventually resorted to taking my lounge telly apart and using the VGA connector at the back of it. This worked, and the install is now running ok. Iām hopeful that once the install is complete Iāll be able to pop the graphics card back in and it might work ok.
So it looks like the GPU was the source of all the problems, seems the install process does not like it!! Weird ā¦
Just to tie things up ā¦
Rocky Linux 9.4 now successfully installed on the Microserver to a 256GB SSD
Seems it only would work in AHCI mode, Iād factory reset the BIOS which put it in RAID controller mode.
The ASUS graphics card would not work post-install, Microserver would not boot successfully with the card installed, it would just hang ā¦ in a similar way to the install failures.
Thanks for all the help!!
Probably unrelated: I had a desktop (Intel-based, Asus? motherboard). It had two HDD set to Intel fakeRAID mirror (RAID-mode). There were NVMe card on PCIe slot. System on NVMe, some data on āarrayā. Fine.
Add NVMe M.2 drive. Invisible for Linux (el8). Switch to AHCI-mode and the drive is visible (but array is lost). MS Windows had no issue; Linux drivers (at least el8) have issue with Intel chipsetās āRAIDā mode.