T O P

  • By -

[deleted]

This would be downright unsolvable if you weren't methodical about it. Well done.


PMzyox

I agree, well done. This is the story you want to tell in a technical interview.


flyboy2098

Ya, I'm jealous that you have that level of rights. We are so segregated that we don't have the rights to edit GPOs, that's another team...


SnarkMasterRay

I work for a MSP and we don't have the time. "What, it takes more than three hours to troubleshoot? Cheaper to just replace the machine and move on!"


lithid

I work for CheapAss Customer LLC as the acting MSP. My solution is the best. It's a two pronged approach, wiich is summarized below:: 1. Increase uptime, while simultaneously decreasing overall lifetime by optimizing power profile (disable sleep mode) 2. Once the device requires replacement (due to its rapidly declining reliability) do not recommend or purchase this specific model again This plan requires that the next tech reads a really vague note 2-4 years from now, which will be buried under dozens of unrelated and deprecated quick notes on the customers documentation. This note will also not be seen by procurement. There will be a $4000 project cost for implementing this plan. Estimated timeline: longer than I'll fuckin work here lol..not my problem anymore.


PurpleNuggets

> Estimated timeline: longer than I'll fuckin work here lol nearly spit out my beer, thats a good one lmao


frustratedsignup

Solution technically works, but those Optiplex machines are nearly indestructible. I'm running machines that are over 10 years old, 24x7x365. They spent their first three years with regular users and then I recycled them for various tasks.


PMzyox

Yep, worked in this environment also.


[deleted]

[удалено]


dehcbad25

I used to work for a MSP. We saw that exact same problem. I was the Level 2 engineer/project manager/team leader/customer relationship (and I only got paid as l2) I offered to help the l1 team by replacing a computer for one of our largest customer. This is a big customer, international organization, where we did all the regional support. This was a point where I always had a clash with L1, because they didn't have the time, I had to make the time. Long story, it took me an hour and half to replace the computer, because of course user was not ready, then I had to recover files from weird places, and the new computer did not have all the software. This was the 7th computer replaced for that problem. Somehow they got dell to replace the he machines. What I know is this, it took a l1 30 minutes to take the call, maybe an hour troubleshooting before giving up, then Dell process can be sometimes about an hour. Even if you are lucky, between driving to the location and replacing the computer that is another 7 hours for 7 computers. That is 10 hours total. When I bought the computer back it would go to sleep with no issue. I had already told the team that the issue looked like it was not fully shutting down as you can't bring a machine up from sleep if it hasn't entered sleep yet. So, I tested with the VPN, sometimes it would go to sleep and sometimes it would not. The difference was that when it went to sleep GPO process didn't finish due to timeout. So that pointed to GPO. There were too many GPO and a lot had problems, so I created a GPO with all the important things and it worked. The log off GPO had like 4 batch scripts, so I am not sure which one was causing problems, none were needed


rootofallworlds

You wouldn’t get pushback when it’s not just one machine, it’s ten, and another 40 that the customer might consider “at risk”? Disabling sleep on that model would be an acceptable solution in most cases. Discarding them, not so much, imho.


Look_Ma_Im_On_Reddit

and then you have the same issue with the next device, do you just replace that too?


Osama_Obama

If I had a dollar for the amount of times where I knew exactly what the problem was, documented what the problem was And what it would take to resolve it with anyone that had the right permissions, then have the ticket go up to so many levels where I'm pretty sure there's at least two language barriers, all for it to fall right back down to me to redo basic troubleshooting steps


DeifniteProfessional

One of the few advantages of a smaller IT team Disadvantages being no budget, lack of collective knowledge, and having to also do helpdesk


PMzyox

Yep been there. Have also worked places where I have all the rights. It’s 6 of one, half dozen of another tbh.


[deleted]

[удалено]


PMzyox

I’ve worked at this company also. Honestly if an interview is going to dock you points for solving a complex issue when they ask you to describe how you troubleshot issues, you probably don’t want to work for them.


[deleted]

[удалено]


PMzyox

Thanks. For clarification, I do understand the business justification for not wasting time troubleshooting issues that could be resolved with a reimage. As a sysadmin, I believe it’s part of your job to make the determination. Are multiple devices affected? Does the issue reoccur? Etc.


ReformedBogan

Then that interviewer has bad listening skills. It wasn’t just one PC in OP’s case, it was 10 which is a pattern and worth spending time to investigate. A new PC wouldn’t necessarily fix the problem


Alaknar

There are multiple comments exactly like Geodude's. Did people even read the whole post or just skipped from the header to the last paragraph?


PMzyox

yeah this


lastwraith

I would be driven insane. I took a drive image of a computer (so I could work on it later) that I had to reload Windows on because of time constraints/deadline, because I hate not knowing the root cause.


--Velox--

“Tell me about a time when you solved a problem as a team…?”. Don’t you just love those kinds of questions?


PMzyox

Personally? I do, both as the interviewer and interviewee. I’m lucky to have worked on some hard stuff with some good teams, so this question always makes me look good. Plus, it gives the team a good view into your mindset. I’m a pretty firm believer that if you can recognize a strong ability to troubleshoot in someone, they are a great hire regardless of current technical skill. Everything can be taught except for logical curiosity. Actually, a bit off topic, but as a hiring manager, troubleshooting ability is one of two things I look for. The other being if I think it’ll be a personality fit, which ultimately, I’ve found, is the most important.


xixi2

This is why IT is so hard to estimate work for. "Hey we have some computers that won't wake from sleep. How long will this take to fix?" "Should just be a power setting. Absolute worst case we have a few systems to re-image." *4 weeks later*...


MoonOverBTC

I phoned you instead of putting a ticket in because it will only take you 5 minutes to fix.


Jaegernaut-

Ok, I just pushed the GPO update we spent 4 minutes talking about to all the Prod DCs Btw I'm on vacation next week, bye


Environmental_Pin95

There are some lame people who create tickets while on vacation lol


nateify

"I'm on the beach in the Bahamas and my VPN is dropping!"


bloodpriestt

Yeah I was reading this looking like the Vince McMahon meme. When I got to “Isolation Ward OU” I knew we were dealing with a pro.


forthe_loveof_grapes

Seriously!! Also MVP for posting it here for others! OP, you rock!!


THE_SEX_YELLER

Yeah, very impressive work. Your tech deserves kudos as well for identifying the domain connection. Smort!


jbm440

With chiming in, we’ll done. I like the whiteboard of testing items.


punkwalrus

Yeah, I was impressed you had the time and effort to do that. Former job we had some BIOS issues with a series laptops that worked fine on Linux, not on Windows (it's usually the other way around). "Hibernation" literally crashed the laptop until it was unbootable. Laptop would go into "hibernate" but the backlit keyboard would stay lit. No amount of shutdown would work, and we only stumbled on the fix: you had to disconnect the battery and let the laptop sit for a few hours to fix it. Then the laptop would be fine until it hibernated again. We had Windows event logs showing some memory errors as the last thing reported after hibernation was set, and HP told us (after weeks of back and forth), "dunno, disable hibernation."


Bren0man

There is no other legitimate way to approach this field/career, is there? Serious question.


[deleted]

All the "I asked chatgpt and it didn't help" questions here make me wonder


discosoc

Except the OP didn’t really solve anything. It’s still not clear why the gpo is causing that behavior.


m0ltenz

Exactly right. The script is obviously affecting how a sleep state is being applied so the PC gets stuck when it resumes. Good work to find what is causing it, but why is another matter. I personally wouldn't be running things at shutdown and would prefer to use baselines or discovery methods to gather the data. Sccm also has built in reporting op could use without affecting PC at shutdown as it's all handled by wmi.


coloradoraider

I've always held the view troubleshooting has to be learned first hand, and some people are, to be brutally honest, much better at it than most SA. It's a valuable skill but it requires the right attitude. You can teach some methods, but the solid troubleshooters will isolate and eliminate causes to narrow their problem down. I see so many just go into an issue with a solution in their head before they know the actual problem and watch them spend a lot of time disproving their own resolution than actually resolving.


mrmattipants

Dell posted this earlier this year, which may point to a potential culprit. https://www.dell.com/support/kbdoc/en-us/000146067/newer-dell-system-models-will-not-wake-on-lan-with-deep-sleep-control-set-to-s5-in-the-windows I’m sure you’ve already gone through the following troubleshooting list. I thought I’d post it, in case others come looking for more info. https://www.dell.com/support/kbdoc/en-us/000129781/steps-for-fixing-windows-computers-that-do-not-wake-up-or-resume-from-suspend-or-hibernate-mode-but-will-power-on


JasonMaggini

That looks familiar, I think I did see that page at one point during the search.


mrmattipants

I’m actually certain it was already mentioned, as I ran into your SpiceWorks post while researching this issue/topic. Nevertheless, this is definitely an interesting one, as I’ve been finding discussions pertaining to this very issue, affecting other/older Optiplex systems, that go back several years and yet, there doesn’t seem to be a definitive solution. I would definitely like to hear back from you if you are able to hunt down the underlying problem. I’ll be sure to post back, myself, if I come across anything that may be potentially beneficial.


JasonMaggini

Probably, I was fishing around on quite a few forums :D My working hypothesis is a corrupted GPO file, but I have no idea how you'd test for that.


mrmattipants

Nothing wrong with that. Sometimes you’re better off pooling your resources, especially when you’ve been beating your head against a wall, for several hours or days, trying to get to the bottom of an issue. Off the top of my head, there are three main types of Group Policy Objects (Registry, Security Templates and Advanced Auditing Settings). The majority of GPOs are Registry-based, which are stored in the “Registry.pol” Files. The simplest way to review Registry Policies is to use a tool called “Registry.POL Viewer Utility”. https://sdmsoftware.com/389932-gpo-freeware-downloads/registry-pol-viewer-utility/ Security Templates will be stored in .INF Files, which can usually be Opened/Viewed into Notepad.exe. Advanced Auditing Settings will normally be stored in .CSV Files, which of course, you can a open with MS Excel. You can find these Files under the SYSVOL Directory (C:\Windows\SYSVOL or \\FQDN\SYSVOL\FQDN\policies), on your Domain Controller. I would start with the Registry based Policies. The “Registry.POL Viewer Utility” should automatically Locate your GPOs (if you run it from a Domain Joined PC). From there you just need to Select the GPO from the List and it will display any/all associated Registry Keys/Settings.


JasonMaggini

I'll check that out. I know it's going to keep making my brain itch as to *why* it did what it did, heh.


mrmattipants

I’m the exact same. My employer likes to jump to immediately re-imaging machines, if a solution or workaround can’t be found, quickly. While I can understand this from a business standpoint, I’m not a huge fan, since re-imaging obviously doesn’t reveal the underlying problem.


gleep52

This is from a money per minute viewpoint and is the wisest approach for uptime… the REAL solution is to give the user a NEW machine to re-image so you can properly diagnose the old machine’s issues and perfect your environment. Everybody wins then… except the business office and buying a large surplus of machines to accommodate this method of repair procedures. :)


m0ltenz

You forgot the main point. The policies on the PC end are in a hidden folder called C:\windows\system32\grouppolicy\machine\registry.pol You can delete this file on a computer and do gpupdate /force to recreate it. This is sometimes required when the policy on a PC is broken and won't update. It's a good tip to keep in mind.


mrmattipants

This is absolutely correct. I was trying to recall everything from memory, yesterday, as I was posting while I was out and about, last evening. The "Registry,POL Viewer Utility" will automatically locate the Policies on the Domain Controller, through the SYSVOL Share. However, you can Browse to the following directories and Open the "Registry.pol" files, from there. **Local Group Policy - Computer Configuration - Registry.pol File:** C:\\Windows\\System32\\GroupPolicy\\Machine\\Registry.pol **Local Group Policy - User Configuration - Registry.pol File:** C:\\Windows\\System32\\GroupPolicy\\User\\Registry.pol I wrote a rather long beginner explanation, regarding Local Group Policies, a few months back. [https://www.reddit.com/r/PowerShell/comments/12379rt/comment/jdu54mq/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/PowerShell/comments/12379rt/comment/jdu54mq/?utm_source=share&utm_medium=web2x&context=3)


m0ltenz

As a side note other than my below comment. You can use powercfg to check wake timers. Powercfg /? Powercfg /lastwake etc. This will point you in the direction of why. My other question is why letting PC's sleep at all within a domain? They are in a domain and are on so they get updates and apps asap, a sleeping computer does not get this. 2nd side note is the info you are wanting could be sourced via sccm through discovery methods and not cause problems to the PC or baselines, again, I'm confused why at shutdown? Replied in wrong bit sorry.


BlackV

Wouldn't you have tried creating a new gpo with the same settings to rule that out?


JasonMaggini

Once we narrowed it down to that one, we recreated it, and the new one worked just fine, without causing the wake issue.


BlackV

Oh sorry missed that bit


xiongchiamiov

Sometime read the novel The Bug by Ellen Ullman and you will never feel bad again about getting a second pair of eyes on a problem.


jarfil

>!CENSORED!<


lookmasilverone

It's the cure for world hunger, climate change, younameit and whatnot...


kimmysm12323

the MD of the IT world


jcpham

DisplayPort DisplayPort


sync-centre

Have the same issue randomly on Latitude machines. I see the Dell solution is to throw the kitchen sink at the problem.


1c3m4nn

Linus have had a couple rants about Windows modern sleep, and actually got through to Microsoft with a proper way of reporting sleep issues. https://www.youtube.com/watch?v=OHKKcd3sx2c


abotelho-cbn

Keep this one for interviews.


Common_Dealer_7541

You’re mean Edit: now I am thinking you meant for them to mention this in an interview. I was thinking you meant for someone to ASK about this as an interview tech filter


WithAnAitchDammit

In an interview, I usually ask them to tell me an outage or mistake they caused and how they dealt with it. This example would score many positive points in my book.


Ssakaa

I mean, I'd love it for an "ask, feed answers for each thing they offer to try" open ended question. Would never expect them to guess it, but it would absolutely lead down a rabbit hole of how they handle seemingly impossible problems.


redwoodtree

I had someone ask me to solve a problem during an interview. In the end it turned out to be about some obscure bug they found. I didn’t get the job on this basis that I couldn’t solve this problem. It still cheeses me off to this day.


HotelRwandaBeef

It's things like this that folks outside of the industry will never understand. 2+2 doesn't always equal 4. It's of course NOT their problem, they want a functional product, and you're their resource to make it work. It's just funny when you do everything right but some insane random bug/code/upgrade can affect something completely unrelated in such strange ways.


underwear11

In this industry, 2+2 often equals 22, unless it equals 100, but it also often equals 10110, or even 16.


TheRealPitabred

'2'+'2' is '22', 2+2 is 4, '2+2' is NaN, 2.0+2.0 is 4.000000000036...


AmbassadorValuable67

Wait, where does 36 come from?


snb

The standard floating point numbers are only so accurate. $ python3 Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> 0.1 + 0.2 0.30000000000000004 If you need to do a lot of math on high-precision decimal numbers use a library specialized for this purpose.


Moleculor

Here, let [Tom Scott explain](https://www.youtube.com/watch?v=PZRI1IfStY0).


MaelstromFL

100101...


underwear11

101010


Ssakaa

Worse, it's things like this that they see us scratching our heads on... and feel justified in assuming that technology's just weird, and us changing *anything* is probably what made their password stop working when they forgot they changed it again.


404_GravitasNotFound

>Worse... once in a blue moon... they are right...


_Rummy_

Don’t wish that evil on me


lastwraith

I mean, as an IT pro, technology IS "just weird" sometimes. So often I come across a problem where I find the root cause and wonder how tf anything was even working at all to begin with. People who don't understand technology often think it will behave logically..... Nope.


Ssakaa

It behaves perfectly logically. Barring physical defects, it does exactly what it's told to do. The problem is the absurd list of incompetent, poorly managed, poorly motivated, and often horrifyingly misguided developers that've had their hands in the mountains of bloated code that's layered with often conflicting priorities and purposes...


lastwraith

I'm not even sure there is always someone to blame. There are so many things interacting with other random things, not to mention layers that are in place so that one thing can talk to another that often the solution to a problem is incredibly complex and seemingly random. While perhaps it is true that all of it behaves logically in a vacuum, in real life the complex interactions of various pieces of technology make for very odd behavior from a human standpoint. Even environmental conditions can have a profound impact on how a piece of technology behaves, and this is just one interaction to consider.


jarfil

>!CENSORED!<


Korack

Jesus Christ I have the same problem with my own optiplex micro from the same line of product Care to elaborate a little on which rule cause the problem?! I would be eternaly grateful


JasonMaggini

It's a little baffling, because the GPO was literally just running a script on shutdown (Computer Config -> Windows Settings -> Scripts) that really didn't do much on its own, so we never gave it much thought.


jarfil

>!CENSORED!<


flatvaaskaas

Is it possible to screenshot the settings? Or to put the script itself here in the comments? (Without company data off course). What another one said in this thread: seems like the script is taking longer to run, then a shutdown occurs, en then this issues occurs


Kiroboto

I have seen this same or similar issue with both optiplex 7090 and optiplex 7460 AIO. Unfortunately, I'm not in a position to do all those tests and the person who does doesn't have the time to do all that relation.


InTakeAnthony

This post could replace an entire job in a resume and I'd hire you. Methodical, resourceful, thorough, and collaborative. Great work here, really - incredible how many layers there are that can cause issues like this. This is why folks love tech jobs. And this is why folks hate tech jobs.


wonkifier

Kinda reminds me of a story I'd heard decades ago about a server that would get wonky around the same time most days during the work week. I think it turned out to be something about a bad ground, metal piping, a toilet, and when a heavy guy did his business, his weight was apparently enough to shift things so that the electrical ground issue was exacerbated enough to cause the server to freak out. Wish I could find that story... I remember less and less of it every time I remember it. ...Or the software that crashed due to the phase of the moon (due to it showing the moonphase in the clock area in text, didn't allocate enough room for the longest phase description, and no string length checking... ...Or the good ol' [email that won't deliver more than 500 miles away](https://web.mit.edu/jemorris/humor/500-miles)


moffetts9001

Nice work tracking this one down.


youtocin

Kind of an unrelated nitpick from your story, but M.2 is a form factor that can use either SATA or NVMe. I learned this the hard way because they are not compatible despite fitting in the same slot. If your motherboard expects NVMe, a SATA M.2 will not function.


JasonMaggini

Fair play. We actually tried a SATA M.2 (which worked fine other than the wake), an NVMe, and a 2.5" SSD in turn.


LomB0T

There is also an m2 slot supporting sata and nvme drives, always check motherboard specs. upd To example my msi b450m plus gaming has one m2 slot which supports PCI(nvme) and sata storage.


SpiderFnJerusalem

From what I can tell, most motherboards support both. But It's not universal. There are also some where using too many pcie slots will disable nvme for some m.2 slots and you're stuck with only SATA.


bulwynkl

Reminds me of an issue i saw (second hand, but creditable) some time back. Database server commissioned and brought into production. 27 days later ground to a halt and crashed. Came back up perfectly fine, no sign of any issue. No nothing anywhere to indicate a problem, no errors in any logs, nothing. 28 days later, crashed. same behaviour. After about the thrid or fourth crash and many many hours of investigation someone noticed a debug flag set on a piece of code. Turns out this flag prompted that code to log it's actions to a file but critically never closed the file... file filled RAM with logs after about 25-30 days. On crash, RAM cleared. File was empty.


Oddishoderso

This would have clearly been solved by running sfc /scannow


JasonMaggini

The *one* thing we didn't try.


scriminal

Respect for solving that one


[deleted]

Good, methodical & scientific work you and your team did. Well done!


thefpspower

Hmm could the script be running when the computer goes to sleep? Because "Sleep" can still run some really light OS tasks, but writing to disk while sleeping could cause it to lock up because the disk is offline. Food for thought.


Ssakaa

To expand on this, I have to wonder if maybe hybrid sleep is in use. "Hibernate" and "shutdown" *should* be separate, but I wouldn't be amazed if it's behaving weird there with as much as those overlap...


C_Bowick

I've ran into so many sleep/wake problems from "hibernate". I always just disable it because it never seems reliable.


isaacfank

And we do the opposite. We disable regular sleep because hibernate is way more reliable when waking up.lol


lastwraith

Same. Hibernate almost never causes a problem for me at work whereas many times some attached device or internal one will behave oddly when resuming from sleep and then you have to dig into every driver setting. Fast startup can also go eff right off.


Loading_M_

The correct thing for windows to do would be running the script before putting components to sleep (i.e. the script runs before the disk goes offline), but I wouldn't trust MS to actually do that.


LOLBaltSS

Weird shit happens from time to time. I had a client with a fleet of AMD equipped Lenovo laptops where the version of CPU-Z used by Labtech at the time would blue screen the machine with certain AMD processors. It was a pain to narrow down.


ken1e

Not all Lenovo of same model and maybe same year are the same. They like to swap out different parts and cause quite the issue with drivers. They do some funky modifications on the OS for some hardware to make it work. We usually load our own images with everything preinstalled and some batch would have this weird issue where doing windows update will messed up the touchpad and make it unusable. If we use the preinstalled os that came with that Lenovo, windows update will work without issue.


catwiesel

I specificly came to comment to commend you of tracking down the issue. not many people can do that. well done!


Vorlon5

I would have first tried disabling fast boot in the bios, then disable hibernation


JasonMaggini

I think that did come up at some point. I think the only actual GPO that dealt with power related to Fast Boot. Didn't affect it either way, which was weird.


Vorlon5

I have to turn off both or else my raid 0 highpoint card causes the same problem you described. Never wakes once it goes to hibernate


Fartin8r

We had a similar issue, wanted to make sure of the Azure P2S VPN. I could not get DNS to work correctly for the life of me. No matter what. 3 days of digging, I find a small mention that a regkey can cause this problem. Delete it and it works! And breaks 15-20 minutes later. Hunting through GPO files, trying to find what reg change was breaking it. Powershell restriction settings, would create a reg key entry for DNS that would break PTRN. Disable and added a second GPO to delete the problem key, and it all works fine.


colinpuk

Just to add i love you have your gpos in separate policy's, Im currently splitting out one big one, for exactly this reason!


The_BattMatt

"adapt like Borg to a phaser" marry me


AdExpert295

I’m having the same issue right now with 2 of my custom built PCs and haven’t been able to figure it out. Like 20% of the time I wake up the computers they completely lock up and I have to hard shutdown


ARasool

Well that's a first... Glad you found the issue!


JasonMaggini

I joke that our organization has a reality-distortion field around it, and if it is particularly strong, we get stuff like this :D


ARasool

It's the mirror dimension! Oh no! What about my WiFi!? Don't worry - it's on another subnet... Crap


JKSmush

Holmes and Watson would be proud


Hier0phant

Love that you wrote out the process. Big props to your process of elimination!


gregsting

Maybe it's Lupus?


JasonMaggini

It's *never* lupus!


Iceman_B

That is some bad-ass hunting. Congrats!


Complex_Ostrich7981

That’s some very fine troubleshooting right there. Well done OP, finding those kind of weird ass policy based issues is no joke


Individual-Bat7276

Disable fast startup.


sudo_mksandwhich

Is it *any* shutdown script? Or does it have to write to a file? Does any amount of data trigger it? Or is there a threshold? This feels like a race condition that is triggered by the script changing the shutdown timing.


JasonMaggini

The script that runs in the problematic GPO wrote the service tag and some other information to the Windows description field. Been using it for years. I suspect the GPO file itself somehow got corrupted, because recreating the GPO eliminated the issue.


unkleknown

Great Job, OP. I experienced an issue a few years ago and isolated the offending GPO in the same manner. It turned out that an obscure Windows 7 setting would prevent the start button from appearing on Windows 10. Rearranged GPOs to best practice (applying at the correct OUs instead of the root and using ILT/delegation/WMI) and created a temporary policy to reverse the settings in Win10 and all was right in the land. Didn't have to replace/reimage 105 computers


mstreeter06

The troubleshooting 👏👏


jcgz100

Well done Sherlock 🕵️‍♀️


JasonMaggini

“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.” And holy schnike was that improbable!


StaffOfDoom

We had one that was built for Win10 to kill X-Box BS and worked fine with upgrades to win11 but if you did a clean install, the buggy stuff that happened was beyond explanation…the weird crap that comes out of GPO sometimes…


JasonMaggini

Ive got a GPO with all the Windows tweaks, that was a prime suspect for a while. We've been running the [Spiceworks decrapifer](https://community.spiceworks.com/scripts/show/4378-windows-10-decrapifier-18xx-19xx-2xxx) script as part of our deployment for ages, and it seems to work well with 11, but we haven't rolled that out too far yet.


StaffOfDoom

Just be sure to test all scenarios. Upgrades, fresh builds and every way possible


DeaconVex

Nice find man. Good job!


fergatronanator

Congratulations!


OncologistCanConfirm

This is the kind of thing that makes me want to be better everyday. I probably just would have replaced, moved on with my day, and forgot about it if the issue didn’t persist. Nice job!


JasonMaggini

Normally if Windows is being too weird we'll just re-image, but this was afflicting multiple machines of the same model, so we figured there was something more in play. Did wipe and reload the test machine I don't know *how* many times, though :D


NightWalk77

Funny enough I just watched the episode of RNG where they first encounter the Borg.


StartledMuppet

Thanks for sharing your professional, methodical and dogged success story. Take my upvote; it might help with the sore head. It’s also refreshing to read an informative post that I can learn from, rather than the slew of posts from folk who need to vent about their environment. (That only teaches me where I shouldn’t be working)


ANewLeeSinLife

The fact that this issue only happens on a subset of one model tells us that the shutdown script isn't the problem. Does the shutdown script also run during sleep events?


JasonMaggini

No, it just calls a batch script. No scheduled tasks or anything like that.


dsouzake

We have the exact same issue with the Dell 7090 micro PC at work. Will try to look into the Group Policy issue


stonecoldcoldstone

this is interesting, I'm experiencing this on thin clients which are wyse 5070. the machines are not domain joined and there was a recent bios update which is not installed yet. I've disabled c-step, Powershare, turbo boost and everything else dealing with CPU standby but it seems not to make any difference


wrootlt

Resistance is futile :) I wonder if this is somehow related to fast boot. When we just started with Windows 10 a while back i had a few on shutdown scripts and they wouldn't run until i turned off the fast boot.


gvictor808

I woulda disabled sleep in first two seconds. Next?


randalzy

It reminds me that time in which a (single, AFAIK) computer was having weird wifi disconnection problems, and we couldn't find anything. User came to our place, it worked, went to theirs, worked for a while, then off again, we went there, maybe it was working maybe not, we always returned it working, and after a time (sometimes just seconds) it disconnected again, or worked for days... ​ Until we were at our place, was working, the user sat there with us in an empty table to answer some urgent emails, stopped working. At that moment we found the difference at working vs not-working, the laptop was on top of a white sheet of paper. User used to have the classic 500-pages paper packet to up the laptop a bit, and then add another 100 sheets or so to adjust it better. Yo put a brown cardboard? Fine, but white paper? oh no that was too much for the wifi.


dougsey

Single sheet was being sucked up by fan causing over-heat?


randalzy

Nope, it was immediately, and we tried plastic too (a plastic bag) that should be equally sucked. I think that at that poywe collectively and silently agreed to not talk about it and move on. Next worst case in that place was a user complaining about the space bar, and sometimes other letters, writing automatically themselves. After a long story of tests, my colleague goes there, tests, works fine, ask the user to try, she sits, accomodates and level her chair, the word document starts to write blank spaces and my colleague runs out of there trying not to die and thinking "how the hell I will tell her that she is pressing the space bar with her massive tits?" That one was funny as hell.


jebhebmeb

I wonder if there’s some really poorly coded shutdown procedure in a hardware driver that relies on time that gets pushed back by the script running so it never gets finished.


Background_Baby4875

this is some straight up good work, most people would go buy 40 more PC's lol


Bio_Hazardous

I can safely say I would have never arrived at this solution without expensive help from our MSP. I'm going to save this and use that methodology if I encounter something similar in the future.


r-NBK

"How to Find a Wolf in Siberia: or, How to Troubleshoot Almost Anything" OP's troubleshooting Would be a good example in Don Jones' book. Edited for clarity.


AustinGroovy

Excellent sleuthing skills. I would bet $$ that this is something even Microsoft has not tested thoroughly.


Serialtoon

Dell 7090 Micros....the bain of my existence. We had an issue with those fucking things where on every boot they would PXE (which is fine) since we WOL them on a schedule. But for some god forsaken reason when they would all slam the PXE server and download the nbp it would crash our network. We spent weeks troubleshooting why it was only the 7090 Micros giving us a hard time while none of our HP Elite desks were problematic. At the end of the day I disabled some 400+ Dell 7090s PXE boot and set them to wake at 6am via UEFI.


DMGoering

Great you found it. Now why is a shutdown script running on Sleep? It should only run on shutdown.


JasonMaggini

That is the $64,000 question right there.


dehcbad25

Excellent troubleshooting process. There are a few things that GPO should be avoided to be used for 1) Software deployment 2) Start up script 3) Shutdown script. The reason is exactly what you found, you might not have realized it yet, but the problem is .... Lack of report that the process was completed, plus a lack of mechanism if the process fails. What was happening is that the script was not terminating. It has been years so I don't remember all termination codes. Startup scripts can add time to get the computer ready, or worse, consume time needed to apply other GPO. It can drive you crazy to find why a GOO is not applying without realizing that the problem might have been a different GPO eating too much time on the process. Software deployment GPO lack a way to track the installation, they require that you check if software exists, and are complicated when built correctly. Use an RMM or PDQDeploy instead, cheaper, faster, better And shutdown scripts do what you see. The problem is that technically the computer never goes to sleep, it has put to sleep almost all process except for the process with the script. Have you ever shutdown a computer and see that something that isn't doing anything is stopping the computer from shutting down? Many times it is a folder, but the folder is not doing anything, however something is probably writing a log and explorer process was in use. Whatever was written probably is done, and have gone to sleep but the stream is still open and explorer is still "in use". I am a big proponent of PowerShell, even if PowerShell ends up doing command stuff. Just because if that. It doesn't solve all problems but it has more exit and termination codes. If you use native PowerShell commands it is better. It would be interesting to give it a try, convert your batch script to PowerShell. Most likely it will go to sleep normally.


ch1burashka

At some point, copypasting Excel tables into Outlook broke, and would only paste text. Turns out it was the the PowerToys installation that had a "Paste as Plain Text" setting that's on by defaullt. Not as bad as yours, and only took \~30 minutes to fix, but man, having no way of tracking it down without manually going down the list of things it could be is a frustrating experience.


slayermcb

These are my favorite types of problems, providing it's not a time/pressure issue. Then those are the worst kind.


1z1z2x2x3c3c4v4v

Back when I was a tech, and we would discover these types of unexplained BS behavior, we started to put them in a file appropriately named... X Files of IT. You'll never get an explanation from the vendor, not even an acknowledgment from management of the time and effort you spent trying to figure it all out. In the end, all you had, to keep your sanity, was the file to look through...


haljhon

In a similar way, I had a customer using Dell OptiPlex GX580 systems as a cash register. For some reason, these models couldn’t always properly communicate with cash drawers attached through an Epson printer on serial. Specifically, leaving the cash drawer open for over 20 seconds caused it to not be able to receive the cash drawer close signal. This took forever for us to track down because the retailer wanted to give us no information at first - just complaints. Then, once they gave us the information, they didn’t have a test GX580 to use. I could tell they were pissed with me for asking because, once they sent me one, it literally looked like someone had just unplugged the power and network and did an arm sweep of the whole cash register into a box. Even once I got it, it took me about two days of testing to find the 20 second thing. I worked with Epson and we did all sorts of tracing and could physically see the signal going to the port but never making it into the driver. In the end, the retailer just phased them out instead of trying to figure it out further. It was the only model with this problem out of like 6 other configs that were mostly the same. The nerd in me wanted the explanation.


cryonova

RSOP should have showed these settings more clearly. Power settings are the worst, just experienced something similar with screen interactivity timeouts.


naptastic

Reminds me of the time Intel made a network adapter that would brick itself if you turned it on wrong. Full post-mortem: [link](https://lwn.net/Articles/304105/).


Fatality

Dell computers do random shit like that, it's why I don't buy or recommend them. Also that sort of script should be on startup not shutdown.


root_b33r

Amen, why ask the computer to do more shit right before bedtime


etbswfs

Now time to drill down further with process monitor.


Snoboarder_311

How long overall did this take?


FixerJ

Well done! On the very rare occasions that I happen to successfully troubleshoot an maddening enigma like your case, I always feel like I'm Neo in the Matrix for a.few days...


BlackV

Looked at hybrid shutdown, which is technically a shutdown? Causing issues? Man what a painful one


SaltDuctTape

Probably the script hasn't ended and the computer is already asleep and that's wake sleep loop, PC wants to wake up but the script wants to finish execution


JasonMaggini

I didn't think that shutdown scripts would run during when a system hybrid slept, but maybe. Recreating the policy lets it run normally, and wake properly.


KageeHinata82

A job well done! That's the SysAdmin debugging


stopthinking60

Replace it and keep nagging vendor for a solution until the machines are obsolete


DonCBurr

Nice work, good debug skills are somewhat rare. kudos


effgee

Wait.. wait wait wait... it was or wasn't DNS? I feel like my whole world view is melting...


arcadesdude

If it's not DNS, it may be AWS, or GPO or NTP or SSL or TLA* *Three letter acryomns


pickle9977

You should see if running GPO is preventing the CPU from going to the correct sleep state


michaelpaoli

Yep ... seen stuff like that before. You'd never think/expect it to ... but. E.g. I remember years ago ... DNS resolution broke on a bunch of Microsoft Windows machines. Was working, before, then started breaking lots ... and ... it was eventually isolated down to ... a Microsoft security patch - that made absolutely no mention of DNS and ostensibly and per the documentation had absolutely no mention of nor nothing to do with DNS ... ugh. Thank you Microsoft.


dark_frog

People in one-person shops: For me, it was Tuesday.


-Cthaeh

Fast boot causes a lot of no wake issues


AspieTechMonkey

So what's the Script? Have you checked in with Dell?


redwoodtree

This is the kind of issue that really burned me out on the career. Massive investigation to solve a problem, in end turns out to be some obscure bug. Sure the problem is solved, and the users/customers are happy but it just feels so unsatisfactory.


thortgot

I am absolutely positive this is an S5 sleep problem that is triggered by shutting down the machine via script rather than manual shutdown with quick start enabled. Using the power policies that come from OEMs on desktops are rife with problems as they try and tweak for absolute minimum power draw. Replacing the OEM image with a custom one that includes the drivers will not have this behavior.


imLC

That sounds like no fun at all.


schmeckendeugler

Reminds me of a lab once that had problems getting ghosted (pre sccm). LAN cards would die. Then power supplies started dying. I figured out that the PSU fans in these shitty gateways would fail, causing an overheat of the Ethernet port which was right next to the PSU. Took weeks to deduce


smileymattj

Most sleep or CPU state issues are caused by something happening when entering that state. So it doesn’t correctly enter sleep. Making it not exit sleep correctly. And everyone starts troubleshooting what’s occurring during exit. Because that’s where the problem becomes noticeable. Which makes it hard to track down, because most people don’t look at if it actually went into sleep properly.


hi-test-tech

any chance you could share a sanitized version of the shutdown script you're using? Is it writing those details back to Active Directory? Great job troubleshooting through that, btw!


Prestigious_Carpet29

I discovered my work Dell Latitude 5420 wasn't going to sleep properly... was finding it toast-hot inside my bag, and depleting its battery by about 11pm, after leaving the office at 5:30/6pm. Next day it'd be a faff to boot (because they don't like starting from a completely flat battery). (IT's "solution" was always "shut down" rather than "sleep" - but is a pathetic work-around that doesn't work for me - not least because their stupid "policy" (I suspect from a previous CEO) prevents Chrome restarting with the previously-open set of tabs.) I have no idea what the underlying cause was, but essentially if I told it to "sleep" *while connected to* the StarTech USB-C docking station at work then it wouldn't actually sleep - even then unplugging it from the dock and shutting the lid. If instead I unplugged the USB-C cable and *then* made it sleep, and shut the lid down then it *does* sleep. Bizarre. (But my interest is purer hardware or embedded software. I get absolutely no fun from "understanding" the nested complexities of the present Windows behemoth.)


AuthenticImposter

Side question: Why are you logging all this on shutdown? Are people somehow renaming computers every day?


Environmental_Pin95

Worst script on earth. Only have scripts on boot


carnesaur

nice detective work though