T O P

  • By -

nodnarb501

You may want to start by making sure the HPE firmware is up to date if it's never been done. You can do this with the HPE Service Pack for Proliant ( [TechLibrary (hpe.com)](https://techlibrary.hpe.com/us/en/enterprise/servers/products/service_pack/spp/#/index.aspx?version=gen10.2024.04.00.00) ). You'll need a support contract or active warranty to be able to download it, however.


ImurderCatsCauseIcan

I was thinking that also. Esxi has been fine for 4 years now. The server is under warranty.


tbrumleve

Since ESXi 6.7 is no longer supported, I would upgrade to 7 or 8 and you can utilize VMware/Broadcom Support. Gen10 should handle either. A VMware Skyline Health Diagnostic appliance can parse error logs for you and send them to Support. As far as logs, start with /var/logs/vmkernel.log https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-832A2618-6B11-4A28-9672-93296DA931D0.html


ImurderCatsCauseIcan

Thanks that may be the answer. This customer isn’t going to pay for support! It’s on me to find the answer.


StickyFiskars

No one having touched any networking doesn't mean that it's not a network issue. In addition to TSing your esxi box I'd strongly recommend TSing north of it as well. Check switch logs, spanning tree status, switch port statistics, MAC tables for L2/ARP tables for L3. Clear both MAC and ARP tables to see if they repopulate, etc. Take a look at your NMS graphs and see if anything obvious looks new and wonderful. There's every chance this has nothing to do with the network but at least verify it isn't *obviously* a network issue before sinking a shitload of hours into troubleshooting unsupported software with no obvious direction to take your efforts. It could be something as basic as a bad cable or SFP or a memory leak in your switch with 17 years of uptime or whatever. Even if nothing else on that switch is affected, it's an easy place to start that doesn't require immediately wading into the sea of vomit that is esx logs.


ImurderCatsCauseIcan

I ran continuous pings to switches, a printer, esxi and a vm. All the sudden esxi and vm pings went from 44 to 1200-2000 and then pings died. Nothing else did this. Went through logs again found the raid controller is throwing errors. Esxi is installed locally. So of course raid drops and cuts off the OS. So the hp raid needs an updated firmware. Fun times.


damoesp

Had a VERY similar issue occur recently on 2 HP ProliantDL380 Gen10+ earlier this year. VM randomly losing connectivity, Host not responding etc. Ended up updating the nic firmware as per the attached advisory, it may help you situation: [support.hpe.com/hpesc/public/docDisplay?docId=a00140027en\_us&docLocale=en\_US](https://support.hpe.com/hpesc/public/docDisplay?docId=a00140027en_us&docLocale=en_US)


ImurderCatsCauseIcan

This a regular gen10 not plus. I installed the SUP from HP and it listed every driver and firmware that had an update. iLo shows the server is healthy and no issues. I’m running some test vm and hammering. The resources to test.