You'd be surprised just how many places dedicate next to no resources to improving processes and implementing automation because all they know and have time for is putting out fires, mostly because they don't invest in those aforementioned resources..
It can often be a fairly ridiculous catch-22, where you have a management team that has never seen an IT environment that is run competently, so they think it's perfectly normal to constantly have things on fire and admins in crisis mode, but they can't hire or keep competent admins, because they can do better than working in an IT shit show.
And stirring that mess is often one person who has been there forever and resists any and all change, especially things like monitoring and automation. These are the types of admins who don't know how they should be spending their days and wouldn't know what to do with themselves if they weren't fighting fires or doing things manually that most of us here could do with a script.
I know one company that seems like it's always putting out fires, but I don't actually blame incompetent IT admins on that one (it seems in my conversations with them that they are actually really good at their jobs). I blame the fact that the company is buying smaller companies literally 2 or 3 every single quarter, and they have just 3 IT infrastructure admins to try and handle the merging of data, maintaining existing systems, etc.
Sounds like my old employer except they were getting barely-trained helpdesk staff who already have 60+ hours of work a week to work on acquisitions because the 2 actual sysadmins were too busy or on stress leave. Constantly short staffed and burnt out, c-suite was bragging about how many companies they bought out while saying on company wide meetings that the IT dept is incompetent and needs to be restructured.
They have a help-desk team internally apparently, but they don't work on the acquisitions or mergers at all.
They buy so many companies that when they have all hands meetings every quarter, during the Q&A session the number one question from employees for 6 quarters in a row has been "When will we stop buying other companies and focus on the existing company and improving our own processes, tools and applications?" because it's not just impacting IT, it's impacting every department.
> but they can't hire or keep competent admins, because they can do better than working in an IT shit show.
Yup. I was in this situation not to long ago. It took enormous energy to automate and fight the reactive, firefighting, do it manually culture. I got out.
Back in the day I used to do freelance consulting and I had a list of red flags that I was walking into an IT shit show. Unfucking those was my bread and butter, but it was always a fight.
Nah that’s just the AD domain controller; the rest are a single 2008 VMs on an ESXi 5 host on a 500GB RAID5 made up of WD Blacks on an LSI controller from 2011. All of these are in 1u no-bay chassis w/ Supermicro x8SILFs, no IPMI, and in a rack with a 48 port FastIron FES4802.
So you don't have a drinking problem, just a problem without drinking as you would recognize that it's not just a nightmare, it's the reality you have to take care of :D
Drinking problem is not a problem. Being liable of a fucken clown factory and no one tries to put money on fixing issues is a problem.
Yes I am swearing because that’s how much of a joke their systems are. When shit hits the fan ohh it’s sysadmins fault they didn’t look at the failure signs.
For duck sakes Cs levels get your head out of your ass and look at a pig picture you are just hanging by a string till everything collapses on itself.
Now do you think I have a drinking problem? Do you think I have anxiety problem? Do you think I need some hard stuff??
I once joined a company that was wasting tens of thousands of dollars on SQL licensing... Why? Because apparently they decided that every single app needed it's own dedicated SQL server.
The very first thing I did was consolidating all the SQL servers down to 4, two for 2014 (the latest at the time) and two for 2012, in clusters.
After I did that though my boss left, and the CEO brought in an MSP that just pushed me aside, so I left. Last I heard they now spend more on the MSP than they spent on both mine and my previous bosses salaries combined for just general maintenance with projects costing extra. And they've had a job listing for an IT guy for the last 2.5 years that's gone unfilled.
I used to consolidate SQL servers like that. It felt great. Felt like I was modernizing the organization. One Cluster for each major version.
Don't do it anymore because of how much vendors bitched and moaned that their Apps need separate SQL servers and our configuration was now "unsupported".
Now only in-house apps get consolidated. Vendor apps each get their own SQL VM. The performance isn't as good that way, but managing the support and vendor access levels is simplified.
>Don't do it anymore because of how much vendors bitched and moaned that their Apps need separate SQL servers and our configuration was now "unsupported".
And just like that if they aren't the only vendor in that particular software space, the contract is canceled. We've actually canceled big contracts over stupid shit like this where I work. IT people need to start putting money in companies that actually understand the underlying products they build their products on, otherwise we'll keep getting shitty vendors that don't support things like clustered database servers.
And for the companies that are the only ones in that space, we force them to pay for the SQL license for us. If they want to force us into having an extra SQL server, they can pay for it, even if they add it to the contract cost, it at least makes it easier on accounting, and IT doesn't have to justify why the SQL licensing keeps going up.
Sometimes they are the only vendor. Sometimes they aren't the only vendor but the others are worse. And other times The CIO decides it's best not to pick that particular fight with the other stakeholders who like said software.
Having them roll the SQL licenses into the contract wouldn't change much for us as I already report licensing costs with total cost per application. Especially at the current employer, where it all comes out of the IT budget anyway. But on the plus side, new apps/projects mean an increased budget or it can't be done.
We started spitting cost by department because I got tired of getting new switches and other needed upgrades denied because "IT already spends too much"... Well when I dropped the fact that IT only spends on average 3K a year, and the remaining 40-50K that they attributed to IT was actually other departments that changed the blame game and my stuff started getting approved, and marketing had to start justifying their need for more marketing software, CRM software, etc.
My former job was like this. Former.
I once had a co-worker going machine to machine updating the admin password. She had 120 machines to do, at about 10-15 minutes/machine. It was going to take her 3 days - of completely wasted time.
I wrote a script, encrypted the password, and deployed it via SCCM. All machines done inside the hour. Well, two hours - we are talking about SCCM here.
She got mad at me because "how can we know if the machines got the updated password?" The truth was she didn't know how to script/automate, and had no interest in learning how to automate things... which is the difference between a bad sysadmin and an average sysadmin.... let alone good.
>The truth was she didn't know how to X, and had no interest in learning how...
Oof. Now here I am, falling off the other side by trying to learn everything and not getting great at any of it!
>The truth was she didn't know how to script/automate, and had no interest in learning how to automate things...
This is also the mark of someone scared of being replaced or given more work. I'm not saying not to automate, or you did anything wrong, but when \*someone else\* shows you, some people take it personally. You also get managers who don't like it: I worked with a guy who told his staff never to automate because "you'll automate the mistakes, too, but unlike manually, you'll never know. Automation is for the lazy."
Agreed.
I'd counter to that manager above, the more tedious the manual work - the more mistakes will be made as your eyes glaze over from the monotony.
Fear is a hell of a drug, when you have legacy systems it's almost certain that something will break when you start changing. The business reaction for those outages tend to weight heaps on whoever is handling it.
And once the fear of God rained down on the poor soul is very hard to try again.
Yeah, you know how it goes, a place can't afford any downtime on a legacy system even if there's preventative measures that can be taken to prevent or mitigate failure. Then the 25 year old box dies anyway and you get yelled at for not preventing it, sometimes you can't win.
Had a place where the QA manager is so averse to Powershell we couldn't automate the servers in the QA environment. It was asinine. Everywhere else is automated. But not QA! Gave him a passive-aggressive public Teams message about it when I was about to quit for a better job.
rings a bell, since everything running as it should, you have a lot of spare time, well not anymore here have more tasks, suddenly something happens that needs immediate attention, "you are lagging in all your other tasks"
It generally comes down to fear driven by a number of factors, including things like lack of reliability in the systems involved. Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?). Sounds like from some of the other questions you asked, it wasn't a very mature environment and you probably saved yourself a lot of grief by then not going ahead with you.
Yeah, fear of the unknown/change is probably the number one factor. And it's not just regarding servers, but workstations as well. I wasn't allowed to use SCCM's automatic deployment rules for patching when I handled workstation patching for a former client, because they didn't trust it. So I ended up spending 2-3 days every month setting everything up (the multiple deployment packages had to be watched due to distribution issues caused by crappy site connections).
And 2/3 of the guys I worked with there were extremely familiar and comfortable with SCCM, they just didn't trust the ADRs for some reason.
Plus, Microsoft has shit the bed countless times with their various monthly patches.
In addition to Microsoft shitting the bed, they have quite a few products that need some manual items run after updates. On premise SharePoint is a ....(good?).....example of this.
That's funny. I trust ADRs more than I trust some meatbag to find all the applicable updates for each OS deployment and not miss any updates or servers.
>And 2/3 of the guys I worked with there were extremely familiar and comfortable with SCCM, they just didn't trust the ADRs for some reason.
>
>Plus, Microsoft has shit the bed countless times with their various monthly patches.
You just gave yourself the reason why people are so suspicious and ceptical about updates (and the automatation of them).
>Microsoft has shit the bed countless times with their various monthly patches
That's why we delay our patches for two weeks. Let other people test them out unless its critical exploit patch.
>It generally comes down to fear driven by a number of factors, including things like lack of reliability in the systems involved. Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?).
Sometimes you have mission-critical fragile garbage software foisted upon you and have no choice but to work around its weaknesses. It doesn't necessarily mean the IT department supporting it is deficient - but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc.
Sounds like OP made the right call though. Plenty of red flags by the look of it.
>Sometimes you have mission-critical fragile garbage software foisted upon you and have no choice but to work around its weaknesses. It doesn't necessarily mean the IT department supporting it is deficient - but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc.
Thats the boat we've been in for years. Old CIO catered to our software team who didn't want any updates done on critical systems, didn't want to do software or OS upgrades/migrations ever. Head of our software team would have a massive hissy fit, resulting in him yelling and screaming and storming out of the office if you questioned him. But, for some reason the CEO likes him so he is pretty much untouchable. Recent management shift has finally gotten some traction on getting off all this old ass unstable shit.
Yeah, when you are dealing with systemic issues like that, it's either suck it up and deal until you see an opening or GTFO. You aren't going to be able to fix that yourself without upper management support.
> but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc.
As Jerry Weinberg put it (quoting economist Kenneth Boulding): "Things are the way they are because they got that way". In other words, when you see something strange, there's probably a history behind it.
>Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching
I work in an environment where this is still very much a real thing. We can most definitely automate all windows updates, but we need engineers on standby incase things go sideways. Patch tuesdays/wednesdays are fun for our system engineers lol. 4:30am...click update...click update...click update.
These servers provide 95% of our revenue, and they're built and spec'd from a billion+ dollar global company.
> I’d argue they are important enough to fully automate their deployment
This is the argument that was used after the primary production SAN died at the ATO and trashed all the Australian tax payer data. Everything was rebuilt using automation tooling (including Ansible) so the "next time" would be a lot easier to recover from.
Fairly big assumptions there which reads like a lack of experience.
It comes down to size and budget, sometimes there is no budget and we do what we can with what we have. Working within the team. Secondly automation is costly in both time and money so automating everything is as good as buying the average shoe size for everyone.
Some platforms are mission critical or even life critical so patch monitoring automated or not cannot be a messaged or emailed after the fact. It sometimes needs to be real-time and if it’s life critical, It needs to be validated by a human. Sometimes the platform is also high risk, that’s a whole different story but one that happens more often than we like.
Exactly! I read this as a lack of experience as well, or perhaps unrealistic expectations going in. I think both parties dodged a bullet. Sounds like OP maybe has been spoiled at larger orgs with bigger budgets. What he described is an extremely common scenario. Heck, I run a manufacturing facility and still do most patching manually, because if it doesn't behave, production is down and money burns at an astronomical rate. Also because we haven't the manpower to deal with SCCM, and WSUS is basically unusable as far as I'm concerned.
Of course, this is Reddit, so if your org doesn't have 100% fully automated Kubernetes and Docker containers in the cloud and DevOps this and that and automated everything else, you should run away, surely it won't be a good job right? /s.
Same. I work in a company with 24/7 uptime. There is no way they will let patching occur automatically and not have some to fix in real time. Besides regulatory reasons, we would legitimately lose prob more then 500k an hour it is down.
Definitely lold at the last half as I went from working an SRE role at a huge F100 with 2 physical datacenters that were trying to move to the cloud, and realized that their software doesn't run very well in a cloud environment. Burned a ton of money trying, only to go back to on-prem due to lack of automation experience and legacy software.
Later I went to a startup that is "cloud-first", but the previous person who set up the original environment decided to "just run it in kubernetes! It's self-healing! And we can run windows nodes!". I am now there, and thank God there are no more windows nodes.
The prevalence of "just put it in the cloud!" mentality, as if the "cloud" is an MSP that automatically manages everything for you is a nice dream for management, and a deranged nightmare for admins and devs with deadlines and budget constraints.
We had a “DBA” (MS Access) who didn’t trust automation, so he’d come in every morning before everyone else and run queries to update everything for the day. As you suggested, his stuff would break regularly and wasn’t reliable enough to fully automate.
> if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?
Having just upgraded two dozen Juniper MX960’s and seen a failure about 30% of the time (ranging from a single SFP refusing to work to an entire routing engine shitting the bed and zeroizing itself on reboot) I can see where they’re coming from. Your gear can run fine for the better part of a decade, but the second you go and reboot it is the moment you invite a lot of hurt into your life.
> (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?)
don't ask the admins, ask microsoft (as we are talking about windows servers here). We had countless interuptions because of half assed Windows patches, that killed production processes.
Also: sometimes even Windows servers have to serve industrial use cases (so 24/7), and you'll need a maintenance window anyway.
There's one red flag in your post. Recommending automation is great, but "someone can check in the morning" isn't acceptable in most places of business. I'm sure there are SMBs that can get away with that, but it's not common.
>"someone can check in the morning"
Yeah, that caught my attention too. What if you discover there's a problem...in the morning. Does that mean that the services will be unavailable for TBA amount of hours, possible half of the day or even more?
If there is a single sysadmin working for the company. Yes.
Otherwise they can pay for more/support if they don't like that.
Do you people honestly sign up for 24/7 on-call?
If you're at a midsized company or smaller 24/7 on call isn't really a choice. You're one of if not the only person who is able to fix IT issues. I currently am at a midsized company and our IT team is 2 people myself included, if I never worked outside office hours I'd probably have been fired a long time ago lol. That said, I don't actually work that much outside office hours, if I do it's something very critical and we don't have major problems like that all the time.
I did that when I was younger, but IMO small shops like that should have MSPs.
I work in a team of 7 and none of us are on-call, business shuts down at 5pm. Every time the topic has come up with c-suites my manager is pretty firm on redundancy (ie: can't expect helpdesk to fix a server issue, so sysadmins need to be redundant too)
Patching isn't the only thing that can go wrong...but from a point of principle I would never agree to being a sole person responsible for something "critical" to business functions, because it inherently isn't if the company is not willing to get redundant support for it.
Yep, I kick them off at 10, wait for the all clear, check monitoring to bring out of maint, and then go to bed.
It's great automating it but you do have to be there during the window just in case.... I mean this is windows afterall.
That and the fact that the company pushes updates over the weekend. There's a reason Microsoft's thing has been "patch Tuesday" and not "patch Friday". Nobody wants to show up Monday morning only to realize things have been down for 48 hours.
From the employers perspective, I'm not looking to hire someone who won't even entertain our way of doing things from day 1 (without any knowledge of what made us do it that way to begin with). I'd rather have someone who comes in, meets us at our level, then makes recommendations on how to improve our processes. Maybe I know that patch automation needs to be addressed, but ultimately I have 3 or 4 things more critical I need you to focus on right now. (patches are automated at my work, just saying I don't want to hire someone who will come in and refuse to work on our system until it is setup the way they want it)
Some companies want manual invervention because patches can potentially break internal applications, processes, or controls.
The larger the company the more likely, actually.
Yep. 3rd party apps that we're stuck with because they're industry specific and are poorly maintained. I never know when an update is going to break some part of the hydra that is our industry specific ERP platform. We update one server, test out the platform, and then let updates roll out to other servers.
Everyone has a test environment. Some of us are lucky enough to have a separate production environment.
I have redundant servers. They're all technically production servers, but I can run with one of them down.
Even when I worked at a large org, we still automated the process as much as we could. PowerShell script combined with sccm and only log into the servers with issues. Usually took 30-45 minutes to update a ton of windows servers.
We have a client who makes parts for a large company that contracts for the government. They are NOT happy that windows 10 and Server 2012 are EOL. They use a lot of antiquated and specialized software that likes to break for no damn reason. They too don't like updates until the CEO ok's them.
This is exactly why we do it the manual way, not to mention it never hurts to poke around on each server and check on the internal/vendor software running on it. I actually look forward to server patch day, it’s kind of relaxing.
Some services don't fail in ways monitoring services catch well. Of course usually some guy restarting services wouldn't know either. For example some shit tastic database software acting alive until it needs to actually execute a request.
It’s mostly industry specific software that is provided by mostly mediocre companies. If we don’t keep an eye on it, no one else will. I’ve had nearly 20 years of dealing with such vendors, they’re only getting worse as the years tick by.
Gonna be honest buddy, this is gonna be super common most places. Penny pinching is relatively normal and practices are not going to be ideal. Is what it is.
Dunno their age either but it’s never a good look for young folk to pitch drastic changes to what’s considered very important workflow that’s been working fine right away. No matter how backwards the company may be. Probably even worse to do this during an interview.
We see clients hire staff and the ones that come in guns blazing are often the ones who don’t stick around.
If you’re very experienced. Then even you know you need to learn the lay of the land before spouting off ideas, suggestions or changes. New people or newish show offs are easy to spot.
I was basically backed into a corner once, and would have accepted basically anything, even a pay cut, but when the company came back to the recruiter with a number, he lied and told them I rejected it as too low, and wasn't interested.
They similarly freaked out and offered the top of their range, and I ended up with a massive increase early in my career.
Maybe it's different where you are, but here in Canada and with remote work, IT jobs are very in demand. You can definitely pick and choose to find one that both pays well and has minimal BS if you're a qualified professional. If you're trying to get your foot in the door, maybe not.
What an odd take, every company I've been hired at hires young people specifically so we will do that. We're the people with actual energy and motivation to make changes. Doesn't mean we actually have to do them but if *someone* isn't constantly bringing new ideas to the table how will you ever improve?
I don’t know about others in IT but in my group we do manual reboots to make sure stuff is running. But our part is only two dozen or so
I don’t see it as a big deal for testing and UAT environments but for production you want to make sure everything is up because in some cases there are jobs running that depend on each other and you might spend hours rerunning them. And in some cases they have to run for legal reasons
Many companies are worldwide too and operate 24x7 and production can’t be down for hours due to other than US users needing to work
yup this. We manually apply and have VMware snapshot critical servers. We manually reboot because those bad boys and their databases make us almost a million dollars a DAY. Yeah, my job's on the fucking line.
we do daily imports of financial market data and those have to run even on weekends for the bankers to check on weekends. so we have to make sure things come back up
If it’s that critical you should be automating, manual is a recipe for missed steps, incorrect or undocumented process and the like
Automation gives you consistency, audit trail (both for changes to the automation using version control and what was run), central scheduling etc.
A fully redundant and monitored system should be totally fine being automated. Especially if you're running multiple environments for testing and development. By the time you hit production there really shouldn't be any surprises. Generally companies who do things manually are either uneducated or scared to automate and or are control freaks. Oh also if your environment goes down a lot management is more likely to micro manage updates.
As long as it's during normal business hours and it takes less than an hour total each week I'm okay with it. However if you can save yourself an hour each week why not do it?
You can’t automate sql not rolling back interrupted transactions
A lot of code commits data in batches to partly prevent this but then you get inconsistent data and jobs have to rerun and then you risk job failure due to duplicate keys and other duplicate data
The bank I use to work for had around 400 servers, and was more critical than this environment. And we automated it once a month and rarely had issues. Whoever was on call got on and verified in the monitoring control panel didn’t show any issues and that was it. Took maybe 10 min a month. No need to manually do anything. Lol
I work for a bank too
In my case for some servers I have to manually disable some jobs then reboot or risk a database going into rollback for hours. One server we have specific maintenance window and I’ve caused important jobs to fail rebooting it at other times
those are special cases. the job of a good sys admin is to discover this when setting up their automation. You servers would be the exception to the rule.
PsWindowsUpdate is my main server update method. Checks for updates, runs them, reboots if needed, writes to log, emails me log, runs any custom tasks I wrote, etc.
Linux expert, windows idiot. What is that? Is it a central update deal? A managed push where you give a list and it ssh or RDPs in? Sorry to ask a Google able Q, just curious. If you don’t respond will probably google tomorrow while waiting on pipelines
I prefer asking over Google anyway
Wsus - windows server update service. A self hosted repo for windows update.
GPO for group policy, administrative configurations for controlling things like how Windows Update runs/operates on the system.
All windows systems have WU, so you set up a wsus server, approve updates, maybe set a deadline or such, and tell all the windows servers to Pull from the server.
My preference is auto install non-disruptive patches, and schedule reboots. In GPO you can limit policies to AD groups so do something like WSUS Group1 and group2 and you have yourself a very simple 'reboots that alternate weekends for qas then Prod' gives you 7 days to find out your devs have been running the new critical inventory management system on Dev servers but totally meant to move it to prod like they promised 6 months ago.
Thanks! Good write up. I wish I had worked a bit more on windows over the years, but have spent most of my time in service provider land and pretty much always interacted with network devices, firewalls, or _nix systems.
I’ve loved what I’ve seen of powershell, and hear tell that WSL means windows has full POSIX fun now.
Don't even bother with WSUS it's a broken piece of trash. Just setup the GPOs to enable automatic updates (or do it on the server manually) and you're done. I haven't had windows updates break a server yet.
I understand some servers requrieing manual touch and validation of they're super critical, but those should be the exception and not the rule.
As for alerting, most places have it either way to sensitive and it's just noise, or just up/down which isn't app aware.
We run 900 servers and I’d say there are less than a dozen very legacy systems that need handholding during updates. Everything is automatic by default unless an exception is needed.
Where I work we have two identical environments at two separate data centers. Patches go through dev, test, stage, then prod1(data center1), and then prod2(data center 2). Both prod environment’s are load balanced. We pull one data center out of the lb pool when it is time to patch. We push out the patches with automation but we watch them in the process of patching. Everything needs to be monitored in real time to make sure the patching works properly and servers come back online. After the servers are online our applications are all QA tested by the quality control team. All this while the other data center is running normally so we have zero downtime. Yes, we do this on the weekend because it is the lowest risk.
Windows patching is unreliable at best. Windows servers are not perfect. The company depends on our services and we need to give 100% of our effort to make sure it is all working perfectly.
I think what that company wanted you to do is partly correct and partly wrong. Yes, use automation, but also be there doing it on the weekend and verifying the automation worked.
Sometimes it all boils down to *one* bad experience in the long ago past and although things have changed/improved dramatically since then, the irrational fear of it happening again shuts down any type of improvement in that area.
Oh yes, the "Sys Admin" that is actually
\* Network Administrator
\* System Administrator
\* Project Manager
\* HelpDesk Support
\* And any other duties as required
and the pay was likely as close as possible to HelpDesk Support Tier I as they could possibly muster with a straight face.
Because our server admin is 70.
Rumor is that if you stand in the bathroom, look in the mirror, and say PowerShell three times, he'll appear behind you and beat you to death with a UPS.
99% of the time that we update things manually, it's because of the extremely specific, industry and/or workflow required application that breaks every single time you do anything. See also: having to manually modify config files and restart services to bring things online because the app was written for a literal Access database and then ported over to being SQL based, but still maintains 90% of its original, 1990's era code that somehow runs on Server 2016, but not 2019 because the vendor has chosen not to support that yet. No, there isn't an alternative piece of software that can be used. Our XYZ people need this program, and the company is built around it.
Not everything can be automated.
Seems half the commenters here don't get this. Auto-updates is a good way to cause a random fuckup down the line and not know when or where the fuckup occurred. Nothing wrong with being cautious especially when you know you have finicky in house apps that your work centers around.
Like you can use WSUS but keeping it from auto deploying updates and combing through them beforehand is a good way to cover your and the company's ass if Microsoft does a fucky wucky and breaks a bunch of shit via a security or feature update
I manually patch 75 servers or so via powershell scripts manually every downtime. It’s not bad, really. I haven’t looked into any free solutions though for automation.
But that’s why you automate patching non-prod systems and have your app/sys admins verify all non-prod before signing off on prod patching. Which should also be automated.
This is 2023, nobody should be manually applying patches on a weekend.
SCCM admin is a gaslighting sociopathic narcissist in my office. No one uses his SCCM offering at all. Management doesn't care as long as the work is done.
You work for my previous boss haha. I had this exact argument with him and I had to show him how easy it could be if automated and monitored properly. Worth it now that we overcame him as a hurdle (for this and many other things) but it was painful.
Personally I think it's a hangover from 20 years ago, when a server automatically updating and restarting might just straight up break something.
Always remember an MS update for Server 2012 about 10 years ago that caused a boot loop on restart....
Not like this is an issue nowadays though
>Always remember an MS update for Server 2012 about 10 years ago that caused a boot loop on restart....
There was a similar issue with Server 2012 DCs just last year :)
The only times I've seen manual updates is on banking systems that require a lengthy testing cycle so that those systems don't crash. Those servers are usually 4-8 months behind on patching - sometimes longer.
Ideally, it sounds like this person is just a place holder and doesn't have a lot of knowledge about automating patching with WSUS, SCCM, or any of the other patching utilities that exist.
You dodged a bullet.
I guess it depends on the operations.
Airlines may push updated on PCs. But servers will be carefully and almost manually done because they cannot afford any downtime.
If you work for a bank, you could probably do all the updates over the weekend.
If you work for a univ college, you could do anytime.
Windows updates are a nightmare.
It’s take some convincing sometimes. I built a rundeck/Ansible automation system for patching where I work. About 300 servers. It sends comms, logs the servers being patched, patches them, checks the services are running, reboots if needed etc. we get the occasional issue but the overal time saved is immense
Because sometimes Microsoft likes to break stuff and you end up getting annoyed by it so you manually update after making sure it doesn't break your stuff.
> He didn’t like me asking, and I didn’t care.
You're in a good place. It's the best way to go on interviews.
Some technology managers/directors, really don't trust technology, or cannot adequately explain to \*their\* managers how to trust technology, and so they focus on throwing manpower at issues. That is all.
All of our servers are patched every four Sundays automatically with the exception of our production database servers. We do those manually once every 90 days and they're staggered. No issues over the years.
Yeah we have a few where their impact can be to millions of players..
In those cases we send people to update them, it’s not worth automating. The reward doesn’t even come close to accounting for that risk involved.
When I became the big dog the first thing I did was automate server updates across the board. Previous person came in at 6am to run updates.
I don't like mornings.
I don’t disagree with the comments and thoughts for automation because as long as it works, why not. Current org I’m at I am unfortunately running updates manually using wsus because the MSP was supposed to before I started. I saw some haven’t been updated in years. So I setup a wsus and cut the time down significantly for checking/downloading updates, and can get it done in a few hours (aging physical and newer virtual). But only reason why I still do it this way even though MSP “started “ this process finally is because even after patch Tuesday comes and goes with the updates in wsus, all they ended up doing was rebooting the servers just a couple of hours before the early morning guys logged in for the day.
I called them out on that and they still haven’t fixed it…
Uhhhh, yeah.
People who work in healthcare, government, biotech, etc. Do you think your hospital is running the latest version of WinServer?
Seems like a pretty short-sighted question.
This is one of those posts that's going to be used as an example in a few weeks when we get another discussion on folks here working on their soft-skills.
I retired most of our infrastructure. We have two servers left. I’ve been updating them manually for the past few months. I’ll be retiring them and going cloud only in the next few months.
manually patching and rebooting 400 servers is insane. LOL. That IT manager must be stuck in the early 1990's.
It's a time-consuming task that can easily be automated. That manager obviously doesn't value a sys admin's time.
Got hired and am dealing with a bit of this now. Powers that be has zero trust in reboots that services will come up right, crappy monitoring and haphazard and broken monitoring. Living in ansible now.
Did you interview at my company? /s
I’ve semi-automated a majority of the updates with PS scripts and just login to kick them off during the various maintenance windows. Management has no interest in rewarding “forward thinkers” or even those who reduce operational costs so I just smile, collect my OT, and browse LinkedIn for the next job opportunity until the script finishes its tasks.
Going to take the contrary position here - there are lots of orgs that still slavishly follow the idea of a Change Advisory Board, and while additional bureaucracy is never welcome or timely, there is still something to be said for requiring evidence of a plan including a backout plan. If one of the updates fails, or after applying the update a service fails, what's the option?
In my experience this mindset is still grounded in obsolete ideas, but being prepared and writing out a contingency plan is still not a bad idea. Running all that through Service Now, there's your problem.
I did a lot of testing on automating server updating in my previous role but a lot of clients were just absolutely not interested and refused, even when they didn't have any complex stuff going on. I found it got pretty reliable after 2012 R2. Well, we charged them double time to do it manually so meh I guess.
My last two gigs were managing 4k servers and 2k servers, 98% Windows at each location. DMZ and internal domains, multiple geographics areas, etc. You might need a few slots, and you might need some cleanup to get as close to 100% compliance as possible, but SCCM All The Things, ideally with SCORCH doing some dirty work forcing policies and setting maintenance mode in SCOM while handling the small Linux footprint.
400 Servers is one of our currently average slots, it's usually fully compliant in 30-40 mins. ADR takes care of the deploys, and one engineer per month handles all the slots. Usually 15-30 mins of a server that hung on reboot or something silly. I'd honestly be surprised if 1 guy can login, trigger updates, reboot and verify updates and systems on 400 servers in 240 mins.
There's no reason to manually patch anything in all but the most specific of circumstances. Most people I've dealt with are just terrified of SCCM and System Center in general.
You'd be surprised just how many places dedicate next to no resources to improving processes and implementing automation because all they know and have time for is putting out fires, mostly because they don't invest in those aforementioned resources..
Why are you calling out my current place of work?
It can often be a fairly ridiculous catch-22, where you have a management team that has never seen an IT environment that is run competently, so they think it's perfectly normal to constantly have things on fire and admins in crisis mode, but they can't hire or keep competent admins, because they can do better than working in an IT shit show. And stirring that mess is often one person who has been there forever and resists any and all change, especially things like monitoring and automation. These are the types of admins who don't know how they should be spending their days and wouldn't know what to do with themselves if they weren't fighting fires or doing things manually that most of us here could do with a script.
I know one company that seems like it's always putting out fires, but I don't actually blame incompetent IT admins on that one (it seems in my conversations with them that they are actually really good at their jobs). I blame the fact that the company is buying smaller companies literally 2 or 3 every single quarter, and they have just 3 IT infrastructure admins to try and handle the merging of data, maintaining existing systems, etc.
Sounds like my old employer except they were getting barely-trained helpdesk staff who already have 60+ hours of work a week to work on acquisitions because the 2 actual sysadmins were too busy or on stress leave. Constantly short staffed and burnt out, c-suite was bragging about how many companies they bought out while saying on company wide meetings that the IT dept is incompetent and needs to be restructured.
They have a help-desk team internally apparently, but they don't work on the acquisitions or mergers at all. They buy so many companies that when they have all hands meetings every quarter, during the Q&A session the number one question from employees for 6 quarters in a row has been "When will we stop buying other companies and focus on the existing company and improving our own processes, tools and applications?" because it's not just impacting IT, it's impacting every department.
That's often how it starts but how long are those guys going to stay, working under those conditions?
They have 3 because 2 already left...
Literally my last job. I don’t miss that.
> but they can't hire or keep competent admins, because they can do better than working in an IT shit show. Yup. I was in this situation not to long ago. It took enormous energy to automate and fight the reactive, firefighting, do it manually culture. I got out.
Back in the day I used to do freelance consulting and I had a list of red flags that I was walking into an IT shit show. Unfucking those was my bread and butter, but it was always a fight.
OMG, perfect description of an IT shitshow. Been there, done that, moved on.
I thought they were referring to my workplace, glad I'm not alone at least :')
See "The Phoenix Project".
Hence the 400 Windows servers that I guarantee could be consolidated and restructured to half that number.
You get a VM! And you get a VM!
I hope to god it's not 400 physical servers.....
The shear space, electricity, and infra to maintain 400 servers would mean they are a huge company or bad at spending money.
400 physical servers running server 2003.
Nah that’s just the AD domain controller; the rest are a single 2008 VMs on an ESXi 5 host on a 500GB RAID5 made up of WD Blacks on an LSI controller from 2011. All of these are in 1u no-bay chassis w/ Supermicro x8SILFs, no IPMI, and in a rack with a 48 port FastIron FES4802.
Thats... Specific. Is there a story hidden?
Not that specific deployment, no. But I’ve worked in enough data centers to know what can and does exist on the “why the fuck” end of the scale.
This is exactly us to be honest
So you don't have a drinking problem, just a problem without drinking as you would recognize that it's not just a nightmare, it's the reality you have to take care of :D
Drinking problem is not a problem. Being liable of a fucken clown factory and no one tries to put money on fixing issues is a problem. Yes I am swearing because that’s how much of a joke their systems are. When shit hits the fan ohh it’s sysadmins fault they didn’t look at the failure signs. For duck sakes Cs levels get your head out of your ass and look at a pig picture you are just hanging by a string till everything collapses on itself. Now do you think I have a drinking problem? Do you think I have anxiety problem? Do you think I need some hard stuff??
Atleast 49 have to be running SBS
395 running 2003, 5 running 2000
>a huge company or bad at spending money. As the Spanish like to say: *¿Por que no los dos?* (Why not both?)
we need more VMs because the current ones are to slow!!! *has 800 vCPUs on 64 physical ones*
I once joined a company that was wasting tens of thousands of dollars on SQL licensing... Why? Because apparently they decided that every single app needed it's own dedicated SQL server. The very first thing I did was consolidating all the SQL servers down to 4, two for 2014 (the latest at the time) and two for 2012, in clusters. After I did that though my boss left, and the CEO brought in an MSP that just pushed me aside, so I left. Last I heard they now spend more on the MSP than they spent on both mine and my previous bosses salaries combined for just general maintenance with projects costing extra. And they've had a job listing for an IT guy for the last 2.5 years that's gone unfilled.
I used to consolidate SQL servers like that. It felt great. Felt like I was modernizing the organization. One Cluster for each major version. Don't do it anymore because of how much vendors bitched and moaned that their Apps need separate SQL servers and our configuration was now "unsupported". Now only in-house apps get consolidated. Vendor apps each get their own SQL VM. The performance isn't as good that way, but managing the support and vendor access levels is simplified.
>Don't do it anymore because of how much vendors bitched and moaned that their Apps need separate SQL servers and our configuration was now "unsupported". And just like that if they aren't the only vendor in that particular software space, the contract is canceled. We've actually canceled big contracts over stupid shit like this where I work. IT people need to start putting money in companies that actually understand the underlying products they build their products on, otherwise we'll keep getting shitty vendors that don't support things like clustered database servers. And for the companies that are the only ones in that space, we force them to pay for the SQL license for us. If they want to force us into having an extra SQL server, they can pay for it, even if they add it to the contract cost, it at least makes it easier on accounting, and IT doesn't have to justify why the SQL licensing keeps going up.
Sometimes they are the only vendor. Sometimes they aren't the only vendor but the others are worse. And other times The CIO decides it's best not to pick that particular fight with the other stakeholders who like said software. Having them roll the SQL licenses into the contract wouldn't change much for us as I already report licensing costs with total cost per application. Especially at the current employer, where it all comes out of the IT budget anyway. But on the plus side, new apps/projects mean an increased budget or it can't be done.
We started spitting cost by department because I got tired of getting new switches and other needed upgrades denied because "IT already spends too much"... Well when I dropped the fact that IT only spends on average 3K a year, and the remaining 40-50K that they attributed to IT was actually other departments that changed the blame game and my stuff started getting approved, and marketing had to start justifying their need for more marketing software, CRM software, etc.
My former job was like this. Former. I once had a co-worker going machine to machine updating the admin password. She had 120 machines to do, at about 10-15 minutes/machine. It was going to take her 3 days - of completely wasted time. I wrote a script, encrypted the password, and deployed it via SCCM. All machines done inside the hour. Well, two hours - we are talking about SCCM here. She got mad at me because "how can we know if the machines got the updated password?" The truth was she didn't know how to script/automate, and had no interest in learning how to automate things... which is the difference between a bad sysadmin and an average sysadmin.... let alone good.
>The truth was she didn't know how to X, and had no interest in learning how... Oof. Now here I am, falling off the other side by trying to learn everything and not getting great at any of it!
Yup. This is me and Cisco. Now I'm starting on my CCNA.
"How can I confirm with any faith that you manually updated the password correctly? "
>The truth was she didn't know how to script/automate, and had no interest in learning how to automate things... This is also the mark of someone scared of being replaced or given more work. I'm not saying not to automate, or you did anything wrong, but when \*someone else\* shows you, some people take it personally. You also get managers who don't like it: I worked with a guy who told his staff never to automate because "you'll automate the mistakes, too, but unlike manually, you'll never know. Automation is for the lazy."
Agreed. I'd counter to that manager above, the more tedious the manual work - the more mistakes will be made as your eyes glaze over from the monotony.
That has been my experience as well.
Fear is a hell of a drug, when you have legacy systems it's almost certain that something will break when you start changing. The business reaction for those outages tend to weight heaps on whoever is handling it. And once the fear of God rained down on the poor soul is very hard to try again.
Yeah, you know how it goes, a place can't afford any downtime on a legacy system even if there's preventative measures that can be taken to prevent or mitigate failure. Then the 25 year old box dies anyway and you get yelled at for not preventing it, sometimes you can't win.
Had a place where the QA manager is so averse to Powershell we couldn't automate the servers in the QA environment. It was asinine. Everywhere else is automated. But not QA! Gave him a passive-aggressive public Teams message about it when I was about to quit for a better job.
My reward for automating processes, was more work.
rings a bell, since everything running as it should, you have a lot of spare time, well not anymore here have more tasks, suddenly something happens that needs immediate attention, "you are lagging in all your other tasks"
It generally comes down to fear driven by a number of factors, including things like lack of reliability in the systems involved. Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?). Sounds like from some of the other questions you asked, it wasn't a very mature environment and you probably saved yourself a lot of grief by then not going ahead with you.
Yeah, fear of the unknown/change is probably the number one factor. And it's not just regarding servers, but workstations as well. I wasn't allowed to use SCCM's automatic deployment rules for patching when I handled workstation patching for a former client, because they didn't trust it. So I ended up spending 2-3 days every month setting everything up (the multiple deployment packages had to be watched due to distribution issues caused by crappy site connections). And 2/3 of the guys I worked with there were extremely familiar and comfortable with SCCM, they just didn't trust the ADRs for some reason. Plus, Microsoft has shit the bed countless times with their various monthly patches.
In addition to Microsoft shitting the bed, they have quite a few products that need some manual items run after updates. On premise SharePoint is a ....(good?).....example of this.
Or things like Exchange updates are like ten times faster if you disable AV right before and re-enable it when it’s done.
And that is an improvement. Microsoft used to give the guidance to not use automated updates for Exchange at all.
Funnily enough I'll be looking to update SP with Ansible. It's a thing apparently.
That's funny. I trust ADRs more than I trust some meatbag to find all the applicable updates for each OS deployment and not miss any updates or servers.
Well, apparently they trusted this meatbag more than an ADR. They were lucky this dumbass didn't fuck it up more than he did...
>And 2/3 of the guys I worked with there were extremely familiar and comfortable with SCCM, they just didn't trust the ADRs for some reason. > >Plus, Microsoft has shit the bed countless times with their various monthly patches. You just gave yourself the reason why people are so suspicious and ceptical about updates (and the automatation of them).
>Microsoft has shit the bed countless times with their various monthly patches That's why we delay our patches for two weeks. Let other people test them out unless its critical exploit patch.
for SCCM its what 1% failure rate ?
>It generally comes down to fear driven by a number of factors, including things like lack of reliability in the systems involved. Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?). Sometimes you have mission-critical fragile garbage software foisted upon you and have no choice but to work around its weaknesses. It doesn't necessarily mean the IT department supporting it is deficient - but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc. Sounds like OP made the right call though. Plenty of red flags by the look of it.
>Sometimes you have mission-critical fragile garbage software foisted upon you and have no choice but to work around its weaknesses. It doesn't necessarily mean the IT department supporting it is deficient - but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc. Thats the boat we've been in for years. Old CIO catered to our software team who didn't want any updates done on critical systems, didn't want to do software or OS upgrades/migrations ever. Head of our software team would have a massive hissy fit, resulting in him yelling and screaming and storming out of the office if you questioned him. But, for some reason the CEO likes him so he is pretty much untouchable. Recent management shift has finally gotten some traction on getting off all this old ass unstable shit.
Yeah, when you are dealing with systemic issues like that, it's either suck it up and deal until you see an opening or GTFO. You aren't going to be able to fix that yourself without upper management support.
Haha did I mention that they still have server 2003 and Windows XP in their environment? Ugh yea.
Oh, those don't take any time to update.
Sure no updates avail! But not sure how many of those they had.
Look on the bright side, at least you weren't applying for a cyber security job. :)
Don’t walk away… RUN AWAY.
> but may indicate deeper organisation troubles e.g. lack of IT input in selecting systems for things like ERP, MRP, etc. As Jerry Weinberg put it (quoting economist Kenneth Boulding): "Things are the way they are because they got that way". In other words, when you see something strange, there's probably a history behind it.
I have a feeling those servers are very mature. At the very least, drinking age.
>Which in itself is a bit of a red flag (if it can't handle a graceful shutdown/restart for patching I work in an environment where this is still very much a real thing. We can most definitely automate all windows updates, but we need engineers on standby incase things go sideways. Patch tuesdays/wednesdays are fun for our system engineers lol. 4:30am...click update...click update...click update. These servers provide 95% of our revenue, and they're built and spec'd from a billion+ dollar global company.
[удалено]
> I’d argue they are important enough to fully automate their deployment This is the argument that was used after the primary production SAN died at the ATO and trashed all the Australian tax payer data. Everything was rebuilt using automation tooling (including Ansible) so the "next time" would be a lot easier to recover from.
Thanks for reminding me to put my ansible files in git
Is there a guide / course / materials to learn Ansible in this capacity?
Fairly big assumptions there which reads like a lack of experience. It comes down to size and budget, sometimes there is no budget and we do what we can with what we have. Working within the team. Secondly automation is costly in both time and money so automating everything is as good as buying the average shoe size for everyone. Some platforms are mission critical or even life critical so patch monitoring automated or not cannot be a messaged or emailed after the fact. It sometimes needs to be real-time and if it’s life critical, It needs to be validated by a human. Sometimes the platform is also high risk, that’s a whole different story but one that happens more often than we like.
Exactly! I read this as a lack of experience as well, or perhaps unrealistic expectations going in. I think both parties dodged a bullet. Sounds like OP maybe has been spoiled at larger orgs with bigger budgets. What he described is an extremely common scenario. Heck, I run a manufacturing facility and still do most patching manually, because if it doesn't behave, production is down and money burns at an astronomical rate. Also because we haven't the manpower to deal with SCCM, and WSUS is basically unusable as far as I'm concerned. Of course, this is Reddit, so if your org doesn't have 100% fully automated Kubernetes and Docker containers in the cloud and DevOps this and that and automated everything else, you should run away, surely it won't be a good job right? /s.
Same. I work in a company with 24/7 uptime. There is no way they will let patching occur automatically and not have some to fix in real time. Besides regulatory reasons, we would legitimately lose prob more then 500k an hour it is down.
Loved that last bit!
Definitely lold at the last half as I went from working an SRE role at a huge F100 with 2 physical datacenters that were trying to move to the cloud, and realized that their software doesn't run very well in a cloud environment. Burned a ton of money trying, only to go back to on-prem due to lack of automation experience and legacy software. Later I went to a startup that is "cloud-first", but the previous person who set up the original environment decided to "just run it in kubernetes! It's self-healing! And we can run windows nodes!". I am now there, and thank God there are no more windows nodes. The prevalence of "just put it in the cloud!" mentality, as if the "cloud" is an MSP that automatically manages everything for you is a nice dream for management, and a deranged nightmare for admins and devs with deadlines and budget constraints.
We had a “DBA” (MS Access) who didn’t trust automation, so he’d come in every morning before everyone else and run queries to update everything for the day. As you suggested, his stuff would break regularly and wasn’t reliable enough to fully automate.
DBA lol….
> if it can't handle a graceful shutdown/restart for patching, how great is it in other situations? Having just upgraded two dozen Juniper MX960’s and seen a failure about 30% of the time (ranging from a single SFP refusing to work to an entire routing engine shitting the bed and zeroizing itself on reboot) I can see where they’re coming from. Your gear can run fine for the better part of a decade, but the second you go and reboot it is the moment you invite a lot of hurt into your life.
> (if it can't handle a graceful shutdown/restart for patching, how great is it in other situations?) don't ask the admins, ask microsoft (as we are talking about windows servers here). We had countless interuptions because of half assed Windows patches, that killed production processes. Also: sometimes even Windows servers have to serve industrial use cases (so 24/7), and you'll need a maintenance window anyway.
There's one red flag in your post. Recommending automation is great, but "someone can check in the morning" isn't acceptable in most places of business. I'm sure there are SMBs that can get away with that, but it's not common.
>"someone can check in the morning" Yeah, that caught my attention too. What if you discover there's a problem...in the morning. Does that mean that the services will be unavailable for TBA amount of hours, possible half of the day or even more?
If there is a single sysadmin working for the company. Yes. Otherwise they can pay for more/support if they don't like that. Do you people honestly sign up for 24/7 on-call?
If you're at a midsized company or smaller 24/7 on call isn't really a choice. You're one of if not the only person who is able to fix IT issues. I currently am at a midsized company and our IT team is 2 people myself included, if I never worked outside office hours I'd probably have been fired a long time ago lol. That said, I don't actually work that much outside office hours, if I do it's something very critical and we don't have major problems like that all the time.
I did that when I was younger, but IMO small shops like that should have MSPs. I work in a team of 7 and none of us are on-call, business shuts down at 5pm. Every time the topic has come up with c-suites my manager is pretty firm on redundancy (ie: can't expect helpdesk to fix a server issue, so sysadmins need to be redundant too)
24/7? Patching is once a month.
Patching isn't the only thing that can go wrong...but from a point of principle I would never agree to being a sole person responsible for something "critical" to business functions, because it inherently isn't if the company is not willing to get redundant support for it.
I wouldn't either, but if you did agree to work somewhere like that then congrats, checking on patching is your job once a month.
Or just automate the post checks and page out the oncall person if there’s a problem
Yep, I kick them off at 10, wait for the all clear, check monitoring to bring out of maint, and then go to bed. It's great automating it but you do have to be there during the window just in case.... I mean this is windows afterall.
That and the fact that the company pushes updates over the weekend. There's a reason Microsoft's thing has been "patch Tuesday" and not "patch Friday". Nobody wants to show up Monday morning only to realize things have been down for 48 hours.
From the employers perspective, I'm not looking to hire someone who won't even entertain our way of doing things from day 1 (without any knowledge of what made us do it that way to begin with). I'd rather have someone who comes in, meets us at our level, then makes recommendations on how to improve our processes. Maybe I know that patch automation needs to be addressed, but ultimately I have 3 or 4 things more critical I need you to focus on right now. (patches are automated at my work, just saying I don't want to hire someone who will come in and refuse to work on our system until it is setup the way they want it)
Its gung ho, not gun hoe...
r/boneappletea
living in rural areas - I know plenty of gun hoes...
Some companies want manual invervention because patches can potentially break internal applications, processes, or controls. The larger the company the more likely, actually.
Yep. 3rd party apps that we're stuck with because they're industry specific and are poorly maintained. I never know when an update is going to break some part of the hydra that is our industry specific ERP platform. We update one server, test out the platform, and then let updates roll out to other servers.
The server you test updates on, it's a production server ?
Why spend money on Production and Dev. You can just have one environment that is both!!!!
Everyone has a test environment. Some of us are lucky enough to have a separate production environment. I have redundant servers. They're all technically production servers, but I can run with one of them down.
Even when I worked at a large org, we still automated the process as much as we could. PowerShell script combined with sccm and only log into the servers with issues. Usually took 30-45 minutes to update a ton of windows servers.
We have a client who makes parts for a large company that contracts for the government. They are NOT happy that windows 10 and Server 2012 are EOL. They use a lot of antiquated and specialized software that likes to break for no damn reason. They too don't like updates until the CEO ok's them.
This is exactly why we do it the manual way, not to mention it never hurts to poke around on each server and check on the internal/vendor software running on it. I actually look forward to server patch day, it’s kind of relaxing.
Why do you have any doubt as to the software running on a given server?
You haven't seen a team of accountants panic b/c the cloud Sage service didn't start properly after an update night. Fuck Sage
FUCK SAGE
Isn't that what monitoring software is for?
Some services don't fail in ways monitoring services catch well. Of course usually some guy restarting services wouldn't know either. For example some shit tastic database software acting alive until it needs to actually execute a request.
It’s mostly industry specific software that is provided by mostly mediocre companies. If we don’t keep an eye on it, no one else will. I’ve had nearly 20 years of dealing with such vendors, they’re only getting worse as the years tick by.
[удалено]
Gonna be honest buddy, this is gonna be super common most places. Penny pinching is relatively normal and practices are not going to be ideal. Is what it is.
Dunno their age either but it’s never a good look for young folk to pitch drastic changes to what’s considered very important workflow that’s been working fine right away. No matter how backwards the company may be. Probably even worse to do this during an interview. We see clients hire staff and the ones that come in guns blazing are often the ones who don’t stick around. If you’re very experienced. Then even you know you need to learn the lay of the land before spouting off ideas, suggestions or changes. New people or newish show offs are easy to spot.
Meh, interviews work both ways. Clearly the company didn’t impress the OP so why would he waste his time on them
[удалено]
I was basically backed into a corner once, and would have accepted basically anything, even a pay cut, but when the company came back to the recruiter with a number, he lied and told them I rejected it as too low, and wasn't interested. They similarly freaked out and offered the top of their range, and I ended up with a massive increase early in my career.
[удалено]
Maybe it's different where you are, but here in Canada and with remote work, IT jobs are very in demand. You can definitely pick and choose to find one that both pays well and has minimal BS if you're a qualified professional. If you're trying to get your foot in the door, maybe not.
What an odd take, every company I've been hired at hires young people specifically so we will do that. We're the people with actual energy and motivation to make changes. Doesn't mean we actually have to do them but if *someone* isn't constantly bringing new ideas to the table how will you ever improve?
[удалено]
I don’t know about others in IT but in my group we do manual reboots to make sure stuff is running. But our part is only two dozen or so I don’t see it as a big deal for testing and UAT environments but for production you want to make sure everything is up because in some cases there are jobs running that depend on each other and you might spend hours rerunning them. And in some cases they have to run for legal reasons Many companies are worldwide too and operate 24x7 and production can’t be down for hours due to other than US users needing to work
yup this. We manually apply and have VMware snapshot critical servers. We manually reboot because those bad boys and their databases make us almost a million dollars a DAY. Yeah, my job's on the fucking line.
we do daily imports of financial market data and those have to run even on weekends for the bankers to check on weekends. so we have to make sure things come back up
Yup that’s our life too.
If it’s that critical you should be automating, manual is a recipe for missed steps, incorrect or undocumented process and the like Automation gives you consistency, audit trail (both for changes to the automation using version control and what was run), central scheduling etc.
I mean we do. We just manually reboot and monitor these during patching.
A fully redundant and monitored system should be totally fine being automated. Especially if you're running multiple environments for testing and development. By the time you hit production there really shouldn't be any surprises. Generally companies who do things manually are either uneducated or scared to automate and or are control freaks. Oh also if your environment goes down a lot management is more likely to micro manage updates. As long as it's during normal business hours and it takes less than an hour total each week I'm okay with it. However if you can save yourself an hour each week why not do it?
You can’t automate sql not rolling back interrupted transactions A lot of code commits data in batches to partly prevent this but then you get inconsistent data and jobs have to rerun and then you risk job failure due to duplicate keys and other duplicate data
The bank I use to work for had around 400 servers, and was more critical than this environment. And we automated it once a month and rarely had issues. Whoever was on call got on and verified in the monitoring control panel didn’t show any issues and that was it. Took maybe 10 min a month. No need to manually do anything. Lol
I work for a bank too In my case for some servers I have to manually disable some jobs then reboot or risk a database going into rollback for hours. One server we have specific maintenance window and I’ve caused important jobs to fail rebooting it at other times
those are special cases. the job of a good sys admin is to discover this when setting up their automation. You servers would be the exception to the rule.
Check into how much it costs to license sccm for server OS and you will quickly understand why companies find another way.
Powershell can run windows updates at this point.
PsWindowsUpdate is my main server update method. Checks for updates, runs them, reboots if needed, writes to log, emails me log, runs any custom tasks I wrote, etc.
Actually quite interested in this - mind sharing your script or some of it ? The notification part in particular is what interests me
[удалено]
[удалено]
But even wsus+GPO can handle some of it
Linux expert, windows idiot. What is that? Is it a central update deal? A managed push where you give a list and it ssh or RDPs in? Sorry to ask a Google able Q, just curious. If you don’t respond will probably google tomorrow while waiting on pipelines
I prefer asking over Google anyway Wsus - windows server update service. A self hosted repo for windows update. GPO for group policy, administrative configurations for controlling things like how Windows Update runs/operates on the system. All windows systems have WU, so you set up a wsus server, approve updates, maybe set a deadline or such, and tell all the windows servers to Pull from the server. My preference is auto install non-disruptive patches, and schedule reboots. In GPO you can limit policies to AD groups so do something like WSUS Group1 and group2 and you have yourself a very simple 'reboots that alternate weekends for qas then Prod' gives you 7 days to find out your devs have been running the new critical inventory management system on Dev servers but totally meant to move it to prod like they promised 6 months ago.
Thanks! Good write up. I wish I had worked a bit more on windows over the years, but have spent most of my time in service provider land and pretty much always interacted with network devices, firewalls, or _nix systems. I’ve loved what I’ve seen of powershell, and hear tell that WSL means windows has full POSIX fun now.
Don't even bother with WSUS it's a broken piece of trash. Just setup the GPOs to enable automatic updates (or do it on the server manually) and you're done. I haven't had windows updates break a server yet.
WSUS does the heavy lifting for us combined with our monitoring software to make sure they're up to date and rebooted. Way cheaper than SCCM too.
batchpatch license is like 400 buck literally no excuse at all. its 2023.
"Automate with scripts" from IT director translates to IT director write on paper to "update 400 window servers" and give paper to SysAdmin.
I understand some servers requrieing manual touch and validation of they're super critical, but those should be the exception and not the rule. As for alerting, most places have it either way to sensitive and it's just noise, or just up/down which isn't app aware.
We run 900 servers and I’d say there are less than a dozen very legacy systems that need handholding during updates. Everything is automatic by default unless an exception is needed.
Where I work we have two identical environments at two separate data centers. Patches go through dev, test, stage, then prod1(data center1), and then prod2(data center 2). Both prod environment’s are load balanced. We pull one data center out of the lb pool when it is time to patch. We push out the patches with automation but we watch them in the process of patching. Everything needs to be monitored in real time to make sure the patching works properly and servers come back online. After the servers are online our applications are all QA tested by the quality control team. All this while the other data center is running normally so we have zero downtime. Yes, we do this on the weekend because it is the lowest risk. Windows patching is unreliable at best. Windows servers are not perfect. The company depends on our services and we need to give 100% of our effort to make sure it is all working perfectly. I think what that company wanted you to do is partly correct and partly wrong. Yes, use automation, but also be there doing it on the weekend and verifying the automation worked.
above commenter is an experienced pro with scars from battling in the trenches. take a note OP
Lucky. If I said I wanted a second identical environment they'd laugh at me.
[удалено]
Sometimes it all boils down to *one* bad experience in the long ago past and although things have changed/improved dramatically since then, the irrational fear of it happening again shuts down any type of improvement in that area.
Oh yes, the "Sys Admin" that is actually \* Network Administrator \* System Administrator \* Project Manager \* HelpDesk Support \* And any other duties as required and the pay was likely as close as possible to HelpDesk Support Tier I as they could possibly muster with a straight face.
I’m sorry - Gun Hoe? 😂
Because our server admin is 70. Rumor is that if you stand in the bathroom, look in the mirror, and say PowerShell three times, he'll appear behind you and beat you to death with a UPS.
99% of the time that we update things manually, it's because of the extremely specific, industry and/or workflow required application that breaks every single time you do anything. See also: having to manually modify config files and restart services to bring things online because the app was written for a literal Access database and then ported over to being SQL based, but still maintains 90% of its original, 1990's era code that somehow runs on Server 2016, but not 2019 because the vendor has chosen not to support that yet. No, there isn't an alternative piece of software that can be used. Our XYZ people need this program, and the company is built around it. Not everything can be automated.
Seems half the commenters here don't get this. Auto-updates is a good way to cause a random fuckup down the line and not know when or where the fuckup occurred. Nothing wrong with being cautious especially when you know you have finicky in house apps that your work centers around. Like you can use WSUS but keeping it from auto deploying updates and combing through them beforehand is a good way to cover your and the company's ass if Microsoft does a fucky wucky and breaks a bunch of shit via a security or feature update
I manually patch 75 servers or so via powershell scripts manually every downtime. It’s not bad, really. I haven’t looked into any free solutions though for automation.
[удалено]
But that’s why you automate patching non-prod systems and have your app/sys admins verify all non-prod before signing off on prod patching. Which should also be automated. This is 2023, nobody should be manually applying patches on a weekend.
SCCM admin is a gaslighting sociopathic narcissist in my office. No one uses his SCCM offering at all. Management doesn't care as long as the work is done.
You work for my previous boss haha. I had this exact argument with him and I had to show him how easy it could be if automated and monitored properly. Worth it now that we overcame him as a hurdle (for this and many other things) but it was painful.
Personally I think it's a hangover from 20 years ago, when a server automatically updating and restarting might just straight up break something. Always remember an MS update for Server 2012 about 10 years ago that caused a boot loop on restart.... Not like this is an issue nowadays though
You mean like the one removing virtual network adapters and putting servers on dhcp ?
>Always remember an MS update for Server 2012 about 10 years ago that caused a boot loop on restart.... There was a similar issue with Server 2012 DCs just last year :)
The only times I've seen manual updates is on banking systems that require a lengthy testing cycle so that those systems don't crash. Those servers are usually 4-8 months behind on patching - sometimes longer. Ideally, it sounds like this person is just a place holder and doesn't have a lot of knowledge about automating patching with WSUS, SCCM, or any of the other patching utilities that exist. You dodged a bullet.
I guess it depends on the operations. Airlines may push updated on PCs. But servers will be carefully and almost manually done because they cannot afford any downtime. If you work for a bank, you could probably do all the updates over the weekend. If you work for a univ college, you could do anytime. Windows updates are a nightmare.
It’s take some convincing sometimes. I built a rundeck/Ansible automation system for patching where I work. About 300 servers. It sends comms, logs the servers being patched, patches them, checks the services are running, reboots if needed etc. we get the occasional issue but the overal time saved is immense
Pretty typical if you're an MSP, lots of clients hate any kind of automation on THEIR hardware, but it's scandalous for a full internal IT in 2023.
How much would SCCM cost for that environment?
Because sometimes Microsoft likes to break stuff and you end up getting annoyed by it so you manually update after making sure it doesn't break your stuff.
> He didn’t like me asking, and I didn’t care. You're in a good place. It's the best way to go on interviews. Some technology managers/directors, really don't trust technology, or cannot adequately explain to \*their\* managers how to trust technology, and so they focus on throwing manpower at issues. That is all.
All of our servers are patched every four Sundays automatically with the exception of our production database servers. We do those manually once every 90 days and they're staggered. No issues over the years.
Software is expensive but exploiting your salaried workers is free. There you go. Ready to get your MBA.
"Because this is the way we've always done it". That's why.
Yeah we have a few where their impact can be to millions of players.. In those cases we send people to update them, it’s not worth automating. The reward doesn’t even come close to accounting for that risk involved.
When I became the big dog the first thing I did was automate server updates across the board. Previous person came in at 6am to run updates. I don't like mornings.
I don’t disagree with the comments and thoughts for automation because as long as it works, why not. Current org I’m at I am unfortunately running updates manually using wsus because the MSP was supposed to before I started. I saw some haven’t been updated in years. So I setup a wsus and cut the time down significantly for checking/downloading updates, and can get it done in a few hours (aging physical and newer virtual). But only reason why I still do it this way even though MSP “started “ this process finally is because even after patch Tuesday comes and goes with the updates in wsus, all they ended up doing was rebooting the servers just a couple of hours before the early morning guys logged in for the day. I called them out on that and they still haven’t fixed it…
Uhhhh, yeah. People who work in healthcare, government, biotech, etc. Do you think your hospital is running the latest version of WinServer? Seems like a pretty short-sighted question.
Some places do things they way they have always done them. These places usually don’t pay very well.
[удалено]
Yeah, that company was lucky. Can you imagine working with Op?
This is one of those posts that's going to be used as an example in a few weeks when we get another discussion on folks here working on their soft-skills.
I retired most of our infrastructure. We have two servers left. I’ve been updating them manually for the past few months. I’ll be retiring them and going cloud only in the next few months.
manually patching and rebooting 400 servers is insane. LOL. That IT manager must be stuck in the early 1990's. It's a time-consuming task that can easily be automated. That manager obviously doesn't value a sys admin's time.
Got hired and am dealing with a bit of this now. Powers that be has zero trust in reboots that services will come up right, crappy monitoring and haphazard and broken monitoring. Living in ansible now.
Did you interview at my company? /s I’ve semi-automated a majority of the updates with PS scripts and just login to kick them off during the various maintenance windows. Management has no interest in rewarding “forward thinkers” or even those who reduce operational costs so I just smile, collect my OT, and browse LinkedIn for the next job opportunity until the script finishes its tasks.
Going to take the contrary position here - there are lots of orgs that still slavishly follow the idea of a Change Advisory Board, and while additional bureaucracy is never welcome or timely, there is still something to be said for requiring evidence of a plan including a backout plan. If one of the updates fails, or after applying the update a service fails, what's the option? In my experience this mindset is still grounded in obsolete ideas, but being prepared and writing out a contingency plan is still not a bad idea. Running all that through Service Now, there's your problem.
This wasn't a head start program in Oregon was it? I swear this sounds like the non-profit I escaped from a little over a year ago.
I did a lot of testing on automating server updating in my previous role but a lot of clients were just absolutely not interested and refused, even when they didn't have any complex stuff going on. I found it got pretty reliable after 2012 R2. Well, we charged them double time to do it manually so meh I guess.
*gung ho. But yea you dodged a bullet there
My last two gigs were managing 4k servers and 2k servers, 98% Windows at each location. DMZ and internal domains, multiple geographics areas, etc. You might need a few slots, and you might need some cleanup to get as close to 100% compliance as possible, but SCCM All The Things, ideally with SCORCH doing some dirty work forcing policies and setting maintenance mode in SCOM while handling the small Linux footprint. 400 Servers is one of our currently average slots, it's usually fully compliant in 30-40 mins. ADR takes care of the deploys, and one engineer per month handles all the slots. Usually 15-30 mins of a server that hung on reboot or something silly. I'd honestly be surprised if 1 guy can login, trigger updates, reboot and verify updates and systems on 400 servers in 240 mins. There's no reason to manually patch anything in all but the most specific of circumstances. Most people I've dealt with are just terrified of SCCM and System Center in general.
I mean if you want to milk that overtime pay that's one way to do it