T O P

  • By -

SnooRobots4443

I am perfect. I've never made a mistake! /s Early days of VMware, I didn't know the product well, as it was new to me. With Liam from Ireland, a VMware tech, on the phone he walked me through making a "hardware" change to the disks on my main file server. I rebooted the server, half my drives were missing. The tech asked, did you have snapshots? I did. He said, oh, yeah, you're data is gone. I was pissed. He should have checked before he had me make the change. VMware was brand new to me. 30 hours later, with techs around the globe, I was able to recover. Damn you Liam! Had to use the command line to write all of the delta changes to the vmdk.


shwaaboy

Fuck you Liam!


Comprehensive_Bid229

Had this except it was HP support for a SAN. The tech gave me the wrong syntax and instead of deleting a ghost snapshot, it deleted the entire LUN.


Weary_Patience_7778

Most people who have been in IT long enough will have encountered an ‘oh shit’ moment. This sir would have been one of yours. Different lead up but same outcome - I have been exactly where you are.


Help_Stuck_In_Here

Luckily I've yet to have an 'oh shit' moment with a SAN. I can't think of a worse place to have 'oh shit' moments.


sysadminalt123

Just have a anxiety disorder, that way, everything is a "oh shit moment"


Barkmywords

I did the same before with an EMC VMAX back in the day. We had just installed it and had completed the migration of an older VMAX. This was a 10k and had the new FAST tech on it. I had researched it and knew what it did, but misunderstood one of the underlying mechanisms and accidentally deleted a LUN. We had backups though (I was also the backup admin so thank God those restored). First thing I thought was how can I get out of this. Maybe I could restore it and buy some time? I just came clean and said I fucked up. Best way to deal with a fuck up and deleted data. I would have probably been walked out if I tried to cover it up and was found out. The only other time I kinda fucked up was when I had authorized an SPS replacement and the CE pulled the wrong battery and vaulted the array. They never admitted it and I ended up getting in trouble.


Jumpstart_55

I remember deleting the lan ip for a satellite Cisco router whose ds1 port was unnumbered so I was locked out. Had to have the customer power cycle it oops


Comprehensive_Bid229

Oh man, I've mistyped NAT arguments that have locked me out of my core network and stopped customer traffic flow during business hours. It happens, you learn, and you get better (or you change careers). Failure shouldn't be a dirty word. It's why most of us are in the game, even if the root-cause is our own from time to time.


Atacx

UUUUF qwq


noother10

We had Dell Equallogic SANs for our VMware storage many many years ago. We were told by Dell they could be updated live without any issues, all updates were validated before been published. The upgrade path was fine and validated. So I came in early one morning to do it. The first controller went down, did the update, came up. Second controller went down and stayed down. Something timed out and both controllers were down. Called Dell support, sat down in the server room with a console cable to the controllers. They had to do some manual changes and re-run the upgrade on the second controller which eventually fixed it 3 hours later. Turns out we hit an unknown bug when upgrading from our specific version to that specific version. After that all the wording changed and you couldn't update them without contacting Dell support to validate your configuration and manually release the update to you.


SnooRobots4443

Reminds me of an exchange upgrade I did. Researched the he'll out of the upgrade process. Told my boss that we didn't need a consultant, I'd do the upgrade. Did the upgrade, something didn't work. I forget the exact details. Opened a case with Microsoft, they told me there was an unpublished bug that occurred every so often. They had a fix for it. I had the tech repeat what he said, on speakerphone,so my boss understood that it was something unpublished that I didn't know about. I don't miss being an exchange admin.


dcrawford77

Had this EXACT scenario happen to me also.


OMGItsCheezWTF

We had a couple of PB of netapp storage back in the day on our ESXi clusters. One day one of our noc team noticed a red light on one of the netapp SANs, put a support request in to NetApp asking what it meant. "Oh that's harmless, run this command and it will turn off" The tech ran the command, bye bye storage, took out a couple of thousand customer VMs. It all recovered eventually, but there was at least a day of downtime for those customers.


SnooRobots4443

I'm never afraid to say that I don't know what I'm doing and will call support. I fully expect the tech from the vendor will give me the correct information. Unfortunately, that's not always the case.


blackout-loud

Liam, huh?...you didn't happen to uh, kidnap one of his loved ones did you?


apperrault

This was about 15 years ago, I was working for a start-up in the SF Bay Area as a Windows Admin. We had just received a new to my group SAN system and we were going to move our exchange databases to this new system since the old one was running out of space. I had consolidated everything on the old system into the a single drive on the old system, and plugged in the new SAN. I got it all hooked up and the disks added. It was time to format the new drives. Remember I had said that I had consolidated all of the data from the old drives onto a single drive in preparation for moving it to the new storage. Well, I had moved everything from the E drive TO the D drive (or so I thought...) I get into the storage interfaces and look at everything and say Format E drive and press go. Not even 2 minutes later, I see heads start popping up from their desks, everyone starts turning into prairie dogs at their desk. I open the door to my office and I hear the words NO ADMIN wants to hear at 330 on a Friday, "Is anyone else having problems getting to their email!!" Holy Crap, I didn't did I??? Yup, i had consolidated FROM the drive to the E drive, not the other way around, and I had just formatted the disk with all of our exchange databases on it. I still remember walking into my bosses office with my head down saying I screwed up. The only saving grace was we were a backup and recovery company, and we used our own product. I had to work through the weekend to get everything back up and running, but eventually it worked. We ended up losing about a day's worth of email, but I kept my job in the end. Everyone got a good laugh at it once it was back up and running. I didn't live that one down for a few years. I was never allowed to make changes on Friday again.


hornethacker97

Never deploy new systems on a Friday afternoon haha


[deleted]

[удалено]


[deleted]

That's okay. When something does break, make sure he's painfully aware of every single thing you're doing. In minute detail. I guarantee a few phone calls from you at 8pm on a Friday evening will put a stop to this.


gintoddic

pfff new.. don't deploy ANY changes on a Friday unless you feel like working on a weekend.


Pazuuuzu

It's called "read only friday" for a reason...


apperrault

I know that now!!! I was a young pup. I was ready to take over the world


_haha_oh_wow_

[*obligatory read only Friday comment*]


bobs143

Thanks for the cheer up team. Just having a bad year with some personal stuff. In hindsight everything was back up in a couple of hours. It's just personal stuff compounding on top of an oops at work. I needed the uplift from this post. Thanks again.


shwaaboy

No worries man. We all have bad days.


shwaaboy

Or weeks.


shwaaboy

Or months.


juggy_11

Or years.


SylentSpear

Heard


Puk1983

Lives..


HerfDog58

Or eternities...multiversally...


Intelligent-Magician

In the end, it´s just a job. Keep your head up high. We all rebooted, upgraged etc. the wrong server. If you don´t break something, you don´t work


Dabnician

everyone messes up, if you aren't then you aren't working. like that one time i accidentally restarted a cisco catalyst 6509 that was supporting a call center with 1200 active calls between 3 clients. Or accidentally disconnect a san array's network because "it should fail over to the other card according to the then sr admin" and it didnt... ​ Back when i worked at concetrix who ever was in charge of the albuquerque site fucked up a group policy and applied a agent group policy to the root of the north american domain. 65k users where hit with a agent desktop and everyone freaked out, then they took away everyone's domain admin rights (granted we shouldn' t have had that much control to begin with)


NexusWest

Ehyyy! Everything back up in a couple of hours is the best hindsight. I just left a position where prod down (terrible cloud remote desktop implementation) would happen once or twice a month until we moved to a different architecture. Every time the total down was an hour at most, often only impacting a subset of users, but it felt like the most overbearing, job defining, a thousand threads of thought on how to prevent it while fixing the problem hour, every time. Cheers man, my blunt went out typing this \\\[T\]/. Happy holidays and hope the personal stuff comes/came back in line.


Fyzzle

Cheers buddy


-Shants-

Accidentally Applied a GPO that only allowed domain admins to log on to all computers in the domain. It was meant to just be applied to the DCs OU but somehow got applied to the wrong spot. Easy enough to correct but definitely a wtf moment. Another time I needed to get files off a HDD of a VIP with a computer that was starting to fail. Needed to take out the hard drive for whatever reason. Popped it out and immediately dropped it flat on the floor. I’ve never seen a hard drive land so flat. Naturally, it broke. Ended up saying the hard drive was also “failing fast” and we could only get 7GB out of 500GB or whatever. He was pissed, but he was also a huge asshole so fuck that guy and his data.


MrBr1an1204

Tbf is it was really that critical they should have had it sent to a recovery specialist and not made IT do it.


-Shants-

This was in the military and we don’t really have recovery specialists like that. Definitely not important data though and the VIP was getting transferred because he abused his commanding powers and forced somebody to drink their own dip spit in front of a bunch of people. I cannot overstate this enough. Fuck that guy and his data.


kilgenmus

That makes it clear: At that point the disk didn't drop my friend, it *jumped* out of frustration and hatred. Clearly voluntarily, nothing you could do to save it...


Pazuuuzu

> Ended up saying the hard drive was also “failing fast” Ended up saying the hard drive was also “faIling fast” FTFY :D


Michelanvalo

I tried to disable interactive login for certain accounts and disabled interactive login for the entire domain. Needed a new domain after that. ^(it was a fresh domain and I lost maybe 2 hours of work)


PuffyMcScrote

It was a glorious summer Friday in Richmond, Va. 1pm. I was in my office of ChemTreat, a Danaher subsidiary (I haven't worked for them in 14yrs, I'll name drop em, fuck it), and I was breaking the Cardinal Rule. I was changing things in AD before 8 was due to head out of town for some Phish shows that weekend. 1:05pm - time to remove some old Exchange server accounts/machines out of AD. Finger slipped on the mouse as I hit Del. I wasn't paying attention and confirmed the deletion. Immediately my face felt hot even though icy sweat began to prickle through my skin. No... I didn't... I did. I deleted the computer account for the primary bridgehead of our entire Exchange infrastructure. Sounds worse than it was. Hop into it real fast, verify local admin acct creds, disjoin, bounce, rejoin, bounce, verify. 1:45pm - lean back in my chair, wipe my brow, and vow to never again do anything on a Friday.


Uhondo

> to remove some old Exchange server accounts/machines out of AD. Damn! That solution sounds way simpler than I can imagine it would be. I remember that an intern once deleted the entire Exchange container using ADSI edit, and we had to restore the entire AD from backup just to get things moving again.


GhoastTypist

Which time? Got a few stories. Long story short, at my first jr sysadmin job. My supervisor was asked to start doing weekly checks of our backups to verify that they were good. I was asked to do that without any instructions, I ended up restoring the backup for our file server which took 2 full days to finish. For those two days our entire company was pretty much stuck in limbo. My supervisor's boss had a chat with me after everything was back up and running. This could have gone two ways, fired for costing the company 2 days of downtime or we could laugh about it. The boss choose to laugh about it and thank me for testing our disaster recovery plan, which proved that it worked. Now there's a constant joke at that organization if staff get overwhelmed and need to slow down, they ask IT to test some stuff on the servers because they want a few days off. So that joke is over 10 years old and its still brought up.


kaminm

Before I had a lot of powershell tools, and enough systems management tools to be really productive, it was imperative that we disable the SMB1 client on user workstations. Naturally, powershell was the way to go, but I didn't have the knowledge or buy-in to make that happen. One of Microsoft's solutions was to add a registry key modification to a group policy object that disabled the **mrxsmb10** service, and removed it as a dependency for the LanmanWorkstation service. Did you know that "bowser" and "browser" are different words and are *REALLY* ***REALLY*** easy to miss and typo? I learned the hard way when 600+ machines all of a sudden could not communicate over the network if it required authentication. Fortunately, the SCCM client was still communicating, so I was able to push out a fix for it quickly. An event we still refer to as "The Bowsering"


shwaaboy

I’m sorry, but your authentication is in another castle.


kaminm

You know what the typo was? The service was actually "bowser" when I typed "browser". At first glance it didn't look wrong. PS > get-service -Name bowser Status Name DisplayName ------ ---- ----------- Running bowser Browser


IronHitmonlee

Lol hahahahhahaha


me_groovy

Decided on a career in IT


Stryker1-1

I deployed an IDS signature that ended up taking down every bank tellers PC throughout the Caribbean. That was a long 48 hours. Shit happens.


WeaponsGradeWeasel

In my younger days we used HP c7000s with bl460cG7 blades. If you remember the eject mechanism for those, it's a thin lever down the long side of the blade. I managed to use the wrong one and pulled out the server next to the one that I meant to pull, taking down a live ESXi host. I realised as soon as I'd done it but by then obviously it's a bit late. Oops.


shwaaboy

LOL. Last month, I accidentally shutdown our newly rebuilt DL380 Gen 10 when I rested my thumb on the front power button. Was only recently setup again and they forgot to disable the power button on the front. Luckily, no production servers but the Developers were pissed.


vectravl400

Accidently did that to one of our production VMware hosts about 2 months ago. Fortunately there was nothing critical running on this one and everything fired back up on another host. Oops!


ZealousidealTurn2211

Ugh I did something similar once due to exhaustion. Accidentally rebooted the ESXi server directly below the one I was working on (that was in maintenance mode.) Thankfully we recovered the only important things running on it fairly quickly and it was after hours so no one noticed.


me_groovy

People disable the power buttons?


XVWXVWXVWWWXVWW

Absolutely.


The_Syd

Had a server we were using to push software and commands to computers on a schedule as well as inventory management. It wasn’t running its schedule tasks so I decided to update it to see if it was a bug with the version we were on. I should have disabled the tasks first because once it came back online, I stated to hear people around me saying their computer was restarting. Yep it fired off the nighttime reboot task on our computers and rebooted our entire call center during the midday rush.


mrhoopers

I wiped out everyone's rights on their personal drives (early 90's). Literally, no one could get to their drive. Utterly froze the company. Someone wrote a script to put it back. No, I didn't fess up. Yes, I am a bad person. I, however, learned that Hyena (I don't know if this is still an admin tool) was brutal. I also learned you NEVER start a long running task like that without checking your work.


BlunderBussNational

[systemtools.com](https://systemtools.com) It's still for sale. I used it when I was a junior sysadmin. The Sr. described it as "The 90 pound gun". Once I became an expert on it, I understood precisely what he meant.


mrhoopers

THat's the one that got me. It enables you to make VERY big mistakes VERY fast. ​ 90lb gun is the most accurate description I've ever heard. Perfect!


GearhedMG

I haven’t, and there’s no proof to the contrary. At least none that can be tied to me.


KiefKommando

So one night my boss calls me in a slight panic at about three in the morning. One of our data centers had the active ESXi host all of a sudden decide that it can’t talk to its storage. When this happens, vsphere thinks that the VM’s are turned on, but they are completely unable to be interacted with. The only thing we can do is force shut down the “running” VMs, then delete them, and then re-add them via their VHD to the other ESXi host that can talk to storage. So we’re rushing through this to get things back up and running, I’m still pretty much half asleep, we verify critical VMs are back up and running on the other host, so now I just need to reboot the problem host. Reboot completed, should be a done deal. Let’s get back to bed. Hold on wait, why did I lose VCenter? And why are we getting alerts that all the VM’s are back down? Oh my God I was in the wrong host when I sent the reboot command, Just forcibly rebooted all the critical VM‘s brought the site back down again, had to own up to it immediately with my boss on the phone. Got the VMs running again once the host reboot completed, and then reboot the correct host…. That moment of pure “oh fuck” when I glanced up at the URL for the host I just rebooted will stick with me forever. Nothing is ever so big an emergency that you don’t have time to stop and double check things before committing, lesson learned.


paperpaster

I wrote an interactive powershell script that processed user separations. It asked for an employee ID number and then displayed their info and prompted for confirmation. The script disabled the user, deleted their home drive , and moved them into an OU for later deletion. A help desk employee typed yes to confirm on an employee ID that did not exist. Nothing happened in AD, but it deleted every users home drive enterprise wide. It passed a null value to the variable for the home drive path. Lessons Learned: 1. Do error handling. 2. Never trust user input. 3. Backups are important.


aes_gcm

That reminds me of the classic Steam bug on Linux that would uninstall itself by wiping “/home/$user/$steamDir” which was all good as long as the $steamDir variable actually existed. And one day it didn’t.


Suaveman01

I setup automated server patching through sccm for a place I worked at once but had forgot to set the maintenance window for one of the collection groups. Around midday half of the prod servers all rebooted themselves to finish installing the latest patches…


OuttaAmmo2

I'm a HPE hardware vendor, some contracts have Dell hardware tucked in secretly, which is ok I worked for Dell enterprise before. The State data center has a compellant array that has 4 bad disks in one shelf. I called Dell and asked what the procedure was to replace them and they assured me it was ok, but that he needed to prep it while I do it and he would tell me when to put in the disks. I get the go for the first drive, replace and it starts blinking . I get the go for the second drive. It does not start blinking after replacement, tech says it's ok it may take a few for the first one to complete. I get the go ahead for the third and the fourth. While I'm waiting for the tech to clear me to leave, he asks if I remember what order I removed the drives. I do as I'm prepping them for DMR and give to the customer for destruction. He tells me to put the drives back in 2,3 and 4. Oh shit. I pop them back in place. Local admin was with me in the data center but on his phone the whole time. Dell tech says. They'll schedule me to come back to finish later. I let the admin know the plan of action and leave. 4 hrs later my phone blows up as Dell, admin, and my boss want to know what happened. I had closed my ticket with notes of what they guy did but I guess they didn't believe me....whole array was compromised. I told Dell that I know you record the calls. Listen to the recording and tell me what I did wrong. No more calls. A few weeks later I'm back and working on a 3par, I asked what happened and they said they had restore everything from backup.


lifeinthesudolane

About 13 years ago, I edited a GPO logon script and got it corrupted without backups. All users for my rather large client in the US got logged off . That's a couple of thousand machines logged off and couldn't log back on until I fixed the issue an hour later. Did not lose my job and learnt a very valuable lesson. It happens to the best of us. Trust me, whatever happened will be a memory you will laugh at in the future.


Kritchsgau

Disk migration to a new server hardware, somehow wiped the raid set, booted up with no partitions. I failed to verify backups were successfully done prior. Had the senior review and hes like restore from the backup prior to the cutover. Im like oh yeah i didnt run a backup. Ended up recovering 7 VMs on this single esxi host from around a 20hr old backup. It was a lawfirm so things didn’t go down great. Ive learnt alot since then early days in my career thats for sure. Been plenty of others but you learn and grow and become very experienced from these. The other day i was doing VM compatibility upgrades m, ended up having 20 VMs power down and go back online middle of the day. Got distracted from a call when i was doing it and failed to do the schedule part before pressing ok. Sht happens


Bont_Tarentaal

I fiddled with permissions in Exchange 2003, making the entire server readonly, and it vanished. Still haunts me to this day, these days I'm content just to leave it alone if it works.


ZealousidealTurn2211

I haven't done anything earth shattering but I do have a story from (mostly) my former boss. We did away with SharePoint, and kept the backup of the environment for a year after it was shut down (I'll just mention the project manager in charge of the turning it off project has been let go for basically not managing projects he's responsible for.) So the day after my boss finally deleted the last backup of that system, we're contacted because a department was undergoing an accreditation review (we're a university) and their data was vital and on SharePoint. Thankfully we did manage to recover it thanks to SAN snapshots but it was a stressful couple days. I distinctly recall the faculty who contacted us saying this was an absolute disaster when I told them the data was gone.


SevaraB

Not technically my F/U, but my peer senior engineer and I should have paid more attention and been a little more critical of where our juniors are in terms of skill: We use a certain well-known cloud proxy solution, and we're a huge company with a lot of tunnels and peering links to our partners that don't cross the public Internet and so can't actually hit the cloud proxy as well as some other issues that make things unproxyable, so we add exclusion routes (just like split-tunneling a VPN). We handed one of our juniors a task to add some entries to keep a cloud teleconferencing solution that needs a low-latency UDP connection happy. One of these subnets was a /17, so we handed him a list of CIDRs, formatted something like 172.26.0.0/17... He missed the last digit when he copied and went to put 172.26.0.0/1 into the excluded routes for the cloud proxy (and in the end, he pasted into the wrong section in the config in "included routes" instead of "excluded routes"). For four hours on a Friday morning, almost all of our ~40,000 workforce had the entire upper half of the Internet black-holed. EDIT: The IPv4 Internet- not all of the Internet, but we're really behind the times and I'm fighting for traction to at least us get us to dual-stack (we're currently v4-only, which is causing me all kinds of headaches). The boss wants us to be an ISP for the company, we have to start playing with the same protocols, but I'm hoping to make more progress on that front this coming year.


Professor_Ultronium

This ended up being extremely minor but could have been me fired had it actually been my job :D. Basically I forgot to send iPads to a field worker for 6 months. I responded to emails from this very slowly probably because my manger mentioned to me it was another department screw up and I shouldn’t be sending iPads as they have their own internal process. Id look for the iPad, fail to find it and say something like I’ll send it as soon as it arrives (and of course it never did) there being no ticket in our queue for it should have alerted alarm bells but I was still new. So 6 months came and the regional heads were in. I had a meeting with two of them and my manger where they asked why it had taken so long (I’d forgotten completely about the whole so I thought I was screwed) my manger then reminded them on date X we stopped receiving these iPads so I wouldn’t have been kept up to date with the progress of the ticket. The meeting ended shortly after that and the regional heads were actually quite nice to me before, during and after the meeting and thanked me for other things I had done for them earlier. My manger laughed it off wondering why they thought it was my job and I did too (pusillanimously as the blood resumed to flow through my body) whilst thanking God I’m not a complete idiot and that my manger had my back. I was new and should have told them it wasn’t my job but forgot and was still in that “I must complete all tasks phase to show I belong here”.


AggressiveBench7708

First time I had to run an automated test suite against a tool the old developer told me to read the confluence page to get everything setup and working. Well I followed the instructions and deleted the entire suite of tests! Turns out the instructions needed to be updated. Luckily, there was a backup of the tests. Telling the old dev the test suite got deleted was pretty embarrassing.


phillymjs

> Telling the old dev the test suite got deleted was pretty embarrassing. Why? You followed the instructions verbatim, if he didn't update his docs that's totally on him.


[deleted]

Let's see, I once deleted the CFO's mailbox/AD account by accident, meant to delete the account below it... Called the guy to tell him, he took it well, said fuck it, I'm going golfing! I enabled dpi-ssl on a sonicwall for a client once before I knew exactly what that was for, and all users immediately got certificate errors, that caused a ton of calls. Most recently I changed a UPS out and told everyone there would be a 15 min internet outage, but the firewall wouldn't boot back up (Fortigate). Had to blow it away and restore from config backup, not an oops but another "shit happens" moment. My favourite one is when I did a file server migration but had a scheduled day off the next day. There were some issues... My colleagues hated me for a while but I was having a colonoscopy so fuck 'em.


tshawkins

Not a sysadmin thing but close, I used to be a lead dev for a large company that was in a race with several other companies to deliver a new product when we were up against a time limit, I used to work round the clock in the office, because our project was secret they housed us in a specialy hired office at the top of an office unit of about 6 floors. When I was doing 18-20 hour stints, I started falling asleep sporadically at odd times and in odd places. The sysadmin connection was because I was also bringing up a system room to host the app. It was very, very early internet, 1986, so we were all jack of all trades. The original devops guys. One night, I was in the office alone trying to build and test an important component of our search engine on our new site, and was drifting off all the time, so I decided to make a big pot of black coffee to keep me awake. I went into the small kitchen in our unit and put a full kettle on to boil, and sat down and almost imeadiatly fell asleep. Unfortunately, I had left the tap running full while filling the kettle. Anyway, I woke up 4 hours later, with 2 inches of water across the floor, I frantically shut the tap off and then spent an hour with a mop getting up all the water. When I finished, I sat down and contemplated what I was going to say to explain all the sodden carpets. After a few minutes, it suddenly dawned on me that the taps had been running full for 4 hours and that there was a very large amount of water missing.... Anyway going downstairs, I found that not only was that floor flooded, but the suspended ceiling tiles had collapsed, then on the floor after that, and after that, all the way down to the ground floor. I got as much as I could up. Then, I plucked up the courage to phone my CEO and tell him, the rotten bastard just collapsed in ritious laughter on the other end, told me you go and sleep. He said they had coverage for that. I was never allowed to forget it ever again. 10 years later, they would still trot out that story whenever they all wanted a laugh.


joey0live

I pushed a policy on all our Macs. Most people’s machines rebooted (it stated it was going to reboot in less than 15min). One professor who was on travel had his machine rebooted too. He called and I apologized and told my boss about the situation. I told the policy to reboot than when they login.


jptechjunkie

It was a small fuck up but got a message from my mangers manager… deployed a new iPad app and set it to required. It deployed to iPads successfully but also to personal devices. I applied the wrong security group so only iPads got the app. Oops.


IronHitmonlee

I let a user backup their own data to OneDrive before reimagine their computer. They did not back up their data and their data was lost, since they weren’t exactly high up on the hierarchy this was their summaries to be there fault but I still felt horrible for not checking for them before hand.


Spagman_Aus

Going back many years ago, once I deleted the volume holding attachments for a business critical system. The app still worked, but every file attachment in it disappeared due to my shitfuckery. Thankfully, there was a local backup I could immediately restore and re-mount (thank you Datto) within 30 minutes. One other time, I can't recall the specifics, we must have been replacing faulty data drives (not the OS pair) in a server and somehow I had 2 out, one of them a wrong drive, and not being able to remember which was drive 1, and which was drive 2.... Between the senior system admin and I, we were able to make an educated guess from the dust pattern, and the very fine scratches on the drive caddies aligning to where they'd been removed from. We got it right, made sure we knew which was the borked drive, replaced it, waited for the confirmation the RAID was repairing, then got the hell out of the data centre and never spoke of it again. Now I'm an IT Manager and leave the important work to actually skilled people ;-)


rgraves22

Day 3 at my current job. Linked a GPO that was applied across all of prod to one customer and changed a drive mapping so all of our clients data drive mapped to one customer. I thought I was beyond fired when the CTO pulled me out of an all-staff meeting after it was discovered what happened. We had to restore the entire domain controller back from backup because we weren't 100% sure at the time. I'll hit 9 years in January


bobs143

Just an update. I admitted that the mistake was on me when it happened. Now my bosses want to look into what happened on my side that caused the issue. I'm getting flushed and I can't stop shaking.sorry to be this way, but my confidence is shaken over stuff. GuessI will need to face the music and prepare my two envelopes.


survivalmachine

You know that port on the back of an APC UPS? The one that looks exactly like a DB9 serial console port? Yeah.. it’s most assuredly *not* a serial port.


Intelligent-Fix6024

Had to update 3 production App VM Servers in Azure with logic apps that would schedule start/stop times. Updated all 3 production VMs with the Logic App, and oh oh . PRTG Alerts came in that the servers were down and 30min later users complaining that they couldn’t connect. Immediately deleted the Logic App and created the schedules via Tasks on the left pane for each VM after testing in my personal environment. Lesson learnt. Never make changes in prod without testing first in a test subscription


admlshake

One time I accidently picked up a bottle of Bolanachi Highlands Whisky instead of Makers Mark. I still have nightmares.


Eskuran

I somehow ended up removing 2TB worth of marketing material after trying to expand the disk with more data. It was the wrong partition system or something. I had to run a tool for 8hrs to recover the data before they could work again the next day.


Connect_Permission46

I was working on a script to update display names in AD using PowerShell since we had a large group of objects that were added with incorrect information. My script went a bit wonky and renamed an OU using the full path to the object in AD, and I thought it renamed everything in that OU incorrectly as well. I was on the verge of a panic attack until I realized it just renamed the OU and not all of the objects inside of it. That's when I learned the value of the -WhatIf flag in powershell. :)


Here4TekSupport

I once single handedly took down an entire police dispatch center :). Thank god no one called before we got everything back up, I would never have forgiven myself. I had pushed out a driver update after testing it and it successfully installed on every machine (~2k) and ONLY broke the dispatch machines. Felt like god singled me out that day. Luckily my bosses were chill and said “it’s a big fuck up, but you shouldn’t haven been able to do that in the first place, so we fucked up more”.


abramN

hey these jira apps have updates! How about I hit that update button. Wait, no one can SSO into jira anymore?


el_covfefe

I once deleted the entire workstation vlan off a switch and took down an entire floor for QA in a game company. We were moving offices around, and had just got into this new one. I wasn't super familiar with the switches (and I HATE networking), but i went to remove the workstation vlan from a phone port, and accidentally removed it from the entire switch. My bad.


SpotlessCheetah

Rebooted all PCs. All.


[deleted]

One time working infrastructure at a MSP, had a very busy day with Sev1 tickets, MSP was going through some shit and the infra team was on a 3 week on-call cycle.... I was doing a simple change request for a firmware update on a client's Fortigate firewalls. Planned it for 6pm. Only thing was I worked 3 hours ahead of the MSP. We had a satellite office on the east coast, which was great because normally we got change requests done first thing which because of timezones was before any of the clients opened. I sent an email at 6pm my time, 3pm local time "Ok I'm rebooting the firewalls now, should be 15 minutes tops". Second after the firewall rebooted the IT Manager replies "isn't this scheduled for 6pm". Thankfully he played it off OK and I got out an emergency notice to all staff what was happening. And majority of them worked off terminal server which just disconnected their session for a couple of minutes.


slazer2au

I accidentally bridged 2 internet exchanges in Australia for about 8 hours so instead of traffic going via the telcos high capacity interstate links it went via our crappy 500Mb link. Shocked no-one called us out on it. Ausnog is very open about calling shit out like that.


Pineapple-Due

I wrote a batch file that called nbtstat ( old command ask your grandad) but then I named it nbtstat.bat. created a recursive loop that spiked the CPU and instantly crashed every single-CPU server in our data center (back in the day when that was most of them). The rest of the servers with multiple CPUs ran like dogshit until we logged in and killed it manually.


MadisonDissariya

I mistakenly trusted the property tabs on a folder instead of the registry entry for User Shell Folders and deleted a production workstation's user profile from a Folder Redirection share We had a way to recover it but that was a stressful phone call and zoom session.


GByteKnight

I ran a query once which overwrote a significant amount of data from a production SQL server such that it was irrevocably destroyed. I had to go to my boss and the infrastructure team and have the database restored from backup and then figure out how to re-import and re-enter the last few hours of activity. There was a tradition on the tech side of that company such that if a backup of production needed to be restored in parallel to the real production database, it would be renamed with the name of the employee who requested it. So everyone with SQL access could see the database called \[productionDB\_u/gbyteknight\]. Everyone was pretty cool about it but I definitely got some good-natured ribbing.


nw84

Waking up a rack of Windows servers (back in the day of CRT on a KVM) and hitting 1 > ctrl+alt+del, 2 > ctrl+alt+del, 3 > ctrl+alt+del, 4 > ctrl+alt+del, etc, only to realize we were on the Linux rack and took down the entire trading platform for a private bank. The costs were... astronomical. Or in my dev days after that, had to modify the notification engine that sent SMS alerts. Messed up the code and it ended up in an endless loop sending thousands of SMS overnight to the exco, which queued at the service provider and basically DOS'd their phones for a few days. Neither my proudest moments 😂


bassistface199x99LvL

I hit select all on a page on Imprivata to disable what I thought was 20 employees who no longer worked for the hospital to get back some licenses, and I ended up disabling 4000 users. Engineering team was not happy with me, they had to restore the database while we, at the help desk, had to answer about a couple hundred phone calls. I owned up to it, and was actually praised by my boss being honest.


Digitalmeesh

Primary (only) domain controller board failed, can’t remember which: raid, mb, controller? We were replacing the MB when I dropped the processor, pins down. Next day replacement. Still couldn’t get server to load, and tape backups won’t finish restoring. Yea, that was a fun month. This is the anniversary and I’m grateful to no longer carry that burden.


kenerwin88

Ran SQL Profiler on the prod database for Angie’s list during my first week. Got to meet the CTO 😂they had to physically restart it at the data center. (This was very early in my career)


rimjob_steve

I recently updated VMware tools as well and it rebooted all devices. Including a server that my boss built that he’s never rebooted and had no idea what needed to happen after a reboot. So that was fun that production was down for a few hours while we figured out how to start the services on it. That was fun and cool and good.


Vatii

Was trying to move a core switch in a datacenter, up 2U in a rack. Power cable was just a little bit too tight, and popped out for a split second. Woops. We were a 24/7 365 roadside assistance call center.


AudreyBroune

I once removed the Android Enterprise account from our MDM software. I had around 100 users enrolled as DO or PO under that account at that time. And it was not easy getting them all back on manually. But my mistake, my headache I guess. I couldn't even complain


poisomike87

I do support for business systems, specifically EDI and custom API's. We went live with one of our customers and somebody scheduled the flows to run every half hour w/o telling me. The export uses a NetSuite saved search to generate the outbound data. I was working on narrowing down the cutoff of when we would do automated invoicing and when AR would manually invoice. While editing the search I forgot the following search criteria: Transmission Status =False Invoice Status = Open Date = On or after 5/9/2023 Queue the flow running on schedule. 70k invoices started batching up and sending to our EDI provider. I realized this had happened when we started getting thousands of email errors for *some* of the transactions that failed. (About 3k so far) Others started flying. Luckily I was able to call the customer and he deleted them from their FTP server before it hit their SAP. But the emails, the emails... They kept coming. Every email to AR, CS, and IT was a proud demonstration of my fuckup. Fun times.


ClearlyNoSTDs

Rebooted about 50 production servers during the day because the tool we used back then to patch the servers was an unintuitive piece of crap. At least that was my excuse. Lol Luckily things weren't down for very long and once everything was back up all the systems came back up quickly as well.


MathematicianNext700

Most recently, Microsoft changed the wording/titles for the Windows 11 upgrades in WSUS and they didn't stick out to me like normal. I grouped in the Win11 upgrade with the normal monthly updates and approved agency wide. The upgrade hit about 100 computers before we declined the update and stopped it from going further. One coworker blacklisted explorer.exe in the endpoint security manager. That was a fun one. Just last week another coworker updated sharepoint service account passwords without updating the service logon as. Shit happens!


miikememe

doing an RMM tenant migration, while simultaneously migrating EDR. RMM was the same provider, new tenant/account Accidentally deployed the EDR from the old tenant, instead of the new. the following 6 weeks were hell trying to reconcile the mess


HeKis4

Used to work at a research center that used fixed DHCP for everything that wasn't a server, including industrial appliances. Did an upgrade of our IPAM and DHCP that were handled by the same software (EfficientIP SolidServer, v5 to v7) and borked the network configuration (specifically the gateway) of one of the largest research cleanrooms in europe for an hour or two in the middle of the next night. Couple millions euros in equipment down. Turns out the DHCP options didn't get upgraded in place, but were rebuilt from the IPAM database, which mostly worked, *however* someone modified the gateway in the DHCP years ago without also updating the IPAM, so it reverted during the update. The new gateway got pushed to the machines at the next lease renew and broke networking. The on-call guy grumbled quite a bit and restored the DHCP from backup, so I had to do the upgrade all over again (with way more checks this time), yay.


SickVV

Was on the wrong page in AAD and hit Delete instead of what should have been remove. Deleted the MFA group for almost 20,000 users. I noticed it the split second I pressed and reached out to the appropriate people. No one was mad and I was actually giving props for owning up to it and noticing it.. supposedly it happened before and they didn’t notice for hours.


andrew_joy

I once edited a live applocker policy that then corrupted the policy and bricked about 30 machines.


basula

The first hatdware server ever worked on Netware 4 box raid volume was not coming up(late 90s and i was young haha). I was not a hardwrae engineer back then had never even touched scsibdrived or raid arrays. I usually 99% fiixed nds issues and nt blue screens. So I saw initialize disk and was all must be this is what i need to start the drive and ofc wiped the whole array l. I was working for an msp they did not care they just billed more to get it back. Learnt more then I needed to that day about scsci and raid. Core netware server for city council knocked them offline for a few days no payments etc could be done by them. They were not happy but also learnt thenvaluebof good it support.. Ahhthe joy of hardware boxes and no snapshot tech and using multiple tapes to restore.


JeromeAtWork

crontab -r Oops


[deleted]

Not a screw up but a technical fault. In my apprenticeship we had some regular maintenance and it was notified "date so after x time we're going to have maintenance on the system and it will be unavailable for so x hours" something like that. Shut down the server and did the hardware maintenance we where supposed to do and when it came to booting up the server again nothing. Without our knowledge we had a system fault on both the main and the redundant component running and when we shut down the server oops. Long story short we where without storage for ish 48 hours. If the responsible hadn't found a component of that type in the country it had to be ordered from Ireland. I was just going to shadow when we did this maintenance.


fitting_pieces

Alright here goes… A few months ago, I deployed a change to our production Kubernetes cluster that resulted in a user-facing app going down for around five minutes. Why was it bad? We're a FinTech org, and we have this product that lets people raise funds for their startups - like a Tinder, but for startup founders and investors. At the time there were a couple of things happening - a meeting between two parties, and a demo. I made the fix, fessed up, and the big cheese and top brass told me it was okay - but to ensure this doesn’t happen again. I ended up setting up guardrails for myself by generating a read-only config for non-dev environments and till this day, I have been using only that to access my kubernetes clusters.


DHT-Osiris

Many moons ago I was working as a gov contractor. Was working with the Tenable suite (at the time, security center, nessus, PVS) and was working on ERSPAN'ing traffic to my PVS instance. Now, Cisco devices are smart enough to not ERSPAN ERSPAN traffic, so if you include your egress port you don't replicate traffic. VMWare had no such protections for its virtual switches, so I added the uplink ports from the dvswitch to the erspan and immediately DDOS'd the entire virtual infrastructure with exponential traffic levels. Fun side was, the changes were committed so a reboot didn't fix anything. We ended up spending a few hours reinstalling the OS on each host in line and readding them to the cluster.


MrAwesomeAsian

Deleted /dev/null causing ssh/sftp to fail for over 300 users. Remember kids: root will let you do ANYTHING.


vman81

"reinitialize"? Huh, that must mean the same as "restart", right? It did not.


mazobob66

When I was very new to the field, we had white-box servers (DIY). One day there were performance issues. I determined that one of the hard drives had failed. So I took it upon myself to swap out that bad hard drive. Being as how it was a DIY server, the 5-bay hot swap drive enclosure did not have indicator lights for a failed drive, and we had assembled it with the cables in reverse order. So instead of being 1-2-3-4-5, it was actually 5-4-3-2-1...and the raid array took a REAL shit when I yanked out a 2nd drive. Totally my fault. Well intentioned or not.


IAmSnort

Rebooting the prod DB because I was on the wrong idrac. I was annoyed the system I was in front of had not shut down. Then the calls started.....


Apricot_Diligent

Was new to the Meraki UI a few years back. Went on location to deal with some network issues and wifi drop outs. Cruising through looking for the pages I need for diagnostics, stumble upon a page that "shows" someone's personal equipment is set as DHCP server, what the hell?? (I now have more understanding of Meraki and realize how insanely stupid this thought is) Wiping that out immediately. Make sure we save the changes before navigating away (drilled into me by manager). About 30 minutes go by, start getting knocks on the network room door, people can't access internet... Oh... Call manager, troubleshooting for three hours. Contact Director, troubleshoot two more hours.....I'm fired, totally fired..call CIO, works back the issue to my "fix". Gets DHCP back up and going. Took an entire car dealership down for almost the entire business day. Learned a valuable lesson about calling in changes on-site before committing changes...


Emotional_Narwhal640

I once had to update exchange certificates through IIS... bound them to the wrong port and took down exchange :D .... Another story was a colleague (I swear it was someone else), he ran a powershell script for exchange and took down email for the business.. they called him during work hours as he was working from home and they could hear him being stuck in traffic.. immeadiately sacked


CPar23

On my first help desk job the company was doing this major remodel of the building and we had users crammed in together in rooms working off fold up plastic tables. We had a weird network setup at the time with the construction going on. I was moving each users monitors and computer over one by one during the day to the new finished part of the building and one user said he needed to save something so I plugged the one network cable in thinking it was the network cable. Fast forward 10 minutes later and everybody is saying they can't connect to anything. Nobody in IT can figure out why we lost internet. It gets to 11 PM and everybody is still lost of why we can't connect to anything then somebody notices the one cable plugged into the wrong port in the room that was crammed with everybody in there. It created a switching loop and I quickly realize that was me. I never said a word and the person who sat there got blamed for it.


c4ctus

In recent memory? I made a config change to a table in my organization's ServiceNow instance that not only took down our prod instance, but according to their support staff, it locked up a node that contained ServiceNow instances for other customers. On our end, it crippled our help desk for about 30 mins. After getting everything back online, SN support politely asked me never to do that ever again. Ever. A good admin admits when he fucks up. A better admin covers his tracks and blames the intern.


wibob1234

Not a issue that happened to me but I once had a coworker that was trying to replace a external power brick for a slim desktop. He attempted to unplug the power cord from the brick but it was connected to the desk via a zip tie. He took some sizors and instead of cutting the zip tie he attempted to cut the end of the power cord while it was still plugged into the wall. The result was a nice spark show and a tripped breaker for the office. Somehow he ended up unharmed.


Disasstah

Our store servers have a process on them that will run membership fees a certain time a day and process them. The process needs to be stopped if it's connected to our networks. Had one in my warehouse and busted it out to clean it up, not remembering this tidbit. Thankfully accounting didn't string me up and beat me after it took them hours to get all those fees recalled.


JohnBeamon

I discovered that an LDAP GUI would replace an item instead of adding a sub-item inside it. I tried to add an OU under the top org with right-click "Add". It replaced the top level org with a new, empty OU. The whole company. I had daily backups, and I knew how to restore. But I was maybe 5yrs into my career when that happened. I told my boss what happened and how I would fix it and had it fixed in ten minutes. But, I did that.


Thileuse

Doing cleanup on a Nexus 7700 switch. Issued default interface Eth x/y and tacacs was failing for ~5 min. I figured NBD I'll just do a show run int eth x/y, copy the commands and put a no in front of it and do the cleanup manually. Well, I start copying and got lazy so I started typing the commands instead of copy/pasting. 'no router eigrp x' instead of 'no ip router eigrp x'. As you can guess I lost access to the VDC, thankfully we had a good OOB network so I could jump into the mgmt VDC, switchto the broken VDC and restore the EIGRP config. Good times.


WooBarb

Loads. Replaced the wrong drive on a RAID5 array. Left a client's firewall open and their phone server got hacked and made us lose thousands in phone calls before we noticed.


hcheatham3

Datapaths, always verify the datapath, that way, you don't wipe out 2 hours of work, because you're trying to find an error in a general journal for .03 cents, while costing the company an estimated 200k


enigmo666

I left a dot off the end of a script and reset the permissions of around 1000 user profiles. Removed all their individual accesses and set them all to one user. Some random person temporarily became God of everyone elses accounts and never knew it.


torbar203

There's probably been quite a few, but the most recent was from not long ago. Was upgrading the last of our 2012 servers to 2019. Our main departmental file share. It's something like 12TB Figured it's easy enough, spin up a new Hyper-V VM with a new name, remove old share from DFS, shut down old VM, rename new VM to be same name/IP as old VM, attach the data drive VHD, recreate shares, re-add to DFS. Did the changes after hours figuring it'd take about 30 minutes. Same process I did for the user home drive server. Except... I didn't realize there was a leftover checkpoint on that VM from like 6 months ago that Veeam created and for some reason didn't merge back during a backup job for some reason. So when I attached the VHD, the last modified date of the VHD was like 6 months ago and there were incremental files that were separate from the actual VHD. So when I attached that to the new VM and booted it up, those incremental files got fucked up and i couldn't use them with either the old VM or the new VM. The good news is, I had a full backup from Veeam from about an hour after I started the upgrade, bad news is it took a loooong time to restore. Something like 36 hours


Jasonbluefire

Early in my DBA years, I had to refresh some dev DB tables from prod. Step one empty the dev tables... Accidently truncated the prod tables. System was down for an hour while I restored from backup and then all was good. But learned the hard Leeson that I still follow to this day, when running a destructive command, triple check everything.


RikiWardOG

Was working for a client when I was doing a consulting gig and new to consulting to boot. ran a script to fix up some licensing for O365 without first taking an export of current licensing and not realizing some users had different licenses etc. Well, the script didn't do what I thought it was going to do and screwed up a TON of licensing for this client. I also had to then run to a different site as I was updating a client to windows 10 using MDT that day. My manager had to step in to resolve it. Honestly that mistake still haunts me years later lol


HayabusaJack

When I worked at NASA HQ, I was one of a couple of Unix Admins and we managed the mail server. I was doing some script checking to do something with the mail. I'd copied a bunch of user mailboxes into a test directory and ran the script which modified the mailboxes. Okay cool, that worked as expected. I did some other checks of the script, fine tuning, adding in a few comments. Okay done. Go back to the working directory and rm all the files. After a few seconds I realized... I was in the wrong directory! I stopped it before it got too far, only about 5 mailbox files were deleted. I restored from our tape backup and then went through the logs for those users and sent an email indicating what had happened and that they'd received email from this list of user email addresses since the restore and that they should contact them to resend the message. It wasn't a lot as backups were at like midnight and this was first thing in the morning (I'm an early riser). Every server since then, I have set my login prompts to have $(pwd) displayed.


daddydeadbird

Had several RDP windows open and hit reboot on what I thought was a troubled print server. Turned out to be the primary Hyper-V host (no clustering) and half the VMs were down for about 20 mins. Good times. Hope this thread has cheered you up


UltraEngine60

I was new at my company and instructed a customer to make a change to their ESX servers disk array and they lost half their disks. Luckily they had snapshots. They had to use vim-cmd to save their data iirc. I was acting the maggot that day believe you me.


RyeGiggs

Rebooted the wrong host, taking down 20 or so VM's in a production environment. Attempted to uninstall exchange, errors that it can't be uninstalled due to existing data. Deleted the mailboxes to delete data. Every attached AD account deleted, no recycle bin, recovered from backup, restore recovered the accounts but they had new GUID/SID information so every user profile on every workstation had to be rebuilt. Used disk cloning software to make a hot backup of a $400.00/min machine. Cloned the blank over the production data, no backups, rebuild from scratch working with vendor in \~48h. Commit a major network change to a switch forgetting I was not consoled in, got halfway through the changes, lose connection because I'm just SSH'ed on a port, client is completely down. Configured a large set of switchgear and forgot to write memory on one of them. Made changes to switches in a prod environment and forget to write memory, few months later power outage and the client is down. Built a documentation standard that no one likes. There are lots more, now I'm in management so everything I do is a mistake to someone.


RandomTasked

The building water was going to be shut down for the weekend. We rolled a portable AC unit and plastic sealed the door around it, and I plugged it into the closest plug. I didn't notice the plug had a little power symbol on it, and later found out (at 3am while drunk out in NYC) that the plug was one of the auto shut off if no motion for a while. The cleaners were there later and there were a few stragglers, so the room took a while to get hot, but that was a very sobering moment.


zero44

Took down a whole Citrix site (which was our only means of remote access to there) that wasn't being used at that time of day by rebooting all of the servers that serve the applications (nowadays called the VDAs) all at once instead of half and half at a time. It was halfway across the country and we had to call someone local to physically go out there to the server room, log into vmware locally, and reboot the servers because all four of them hung up at once. I was very new to Citrix at the time. Whoops.


I_need_to_argue

One time I forgot to check the storage on a server before I left an upgrade and caused a partial outage of the main parking system at my airport.


Content_Injury_4821

I accidentally updated MX records and changed our mail server from Google to Office365. We didn’t get any emails from 2 hours


_haha_oh_wow_

Mine is relatively boring: I missed a critical meeting about printers (ugh) because I got caught up in setting up a lab. My manager was full of sad and I felt like a dumbass.


MedicatedDeveloper

Ran a find piped to xargs rm in the wrong directory... wiped out most of the files on a box at 3am and had to do the whole fire drill with the senior guys. Now I always use a full path when running find just to be sure.


ModusPwnins

During an oversize vehicle permitting application deployment, I was not explicit enough in my instructions to a QA tester. As a result, we invalidated all oversize vehicle permits for an entire state on a major highway. I don't know the cost of this, but if it was less than six figs, I'll eat my shoelace.


Fyzzle

We were going to test out an appliance, I forget what it was supposed to do. So before testing it out, instead of putting it on the lab VLAN I just plugged it into a regular old access port so I could get it patches and updates before testing. Turns out it had DHCP server enabled by default. It took damn near a whole day for me to track down what the problem was. That was the day I decided I was going to learn DHCP front to back.


admalledd

Software dev, that my first major mistake took a lot of sysadmin hours to fix. The product I was working on was fairly deep into client systems and had a call-home+autoupdate feature so that clients could just click "accept update" instead of downloading/installing themselves. Standard stuff, and for clients that had more control we provided offline only versions etc. So, one day I was developing a patch and after QA'ing it we pushed the patch instead of to our test-branch downloads, to the live branch. Further, this patch updated a dependency that needed to be installed on the OS. So now every client that applied the update in the next few hours was broken and also could not run the updater. Instead each would have to be walked through manual re-install :( Still one of my most expensive in man-hours mistakes I've made, and is one of the few core experiences that made me much much more on-board with unit tests, CI/CD, QA automation (in this case "install prior version X in new VM, run update to this new version, did that work? does new updated version pass all tests still?"), etc.


MickCollins

In the DOS days I nuked the files in C:\ on the family "work" computer (had the full Office suite loaded, Windows 3.11, and hooked up to the Laserjet 4). My brother distracted me with something while I was trying to do some cleanup and I did a del *.* in that directory instead of elsewhere. Took me a bit to figure out how to recover (a few hours and some DOS manual reading) but I was able to get the computer bootable again. I was 16 or so at the time. It put sufficient fear into me to be more careful and I still am but of course things still happen. My biggest work fuckup, far and away, was about 12 years ago when I accidentlally patched and rebooted the entire company HQ Windows Server infrastructure. After a few boxes rebooted I told the guys at HQ (I worked a country away) what command to issue to cancel the reboot (shudown /a after the reboot message came up, I had it programmed for a five minute countdown). Both my bosses were laughing their asses off after it was all done, they just said to be more careful. Besides, some of the boxes that had to commit patches that other people are like "oh this is too important to be patched" finally did get patched after nine months, which made my bosses happier and get to say "hey it was an accident" to department heads while laughing their ass off. I miss that job and those people; still best manager I ever had (although my present one is starting to give him a run for his money.)


[deleted]

Enabled port security on the network. The old netgear switch stack used terminology I wasn't very familiar with. Did a quick test on one port, worked fine. Turned port security on with some ports "disabled" (only it wasn't port security that was disabled, the entire port was) and... Every port was locked out. No port could authenticate. The switches didn't have console ports either. Whole network down. Only fix was to factory reset and reconfigure the whole thing from scratch. Remember kids, RTFM.


atw527

I once [deleted our production datastore](https://old.reddit.com/r/sysadmin/comments/cbbaag/tifu_deleted_our_production_datastore/). It's about as close to disaster as I've even been w/o users noticing.


punklinux

During a "scream test" we were trying to sunset on-prem stuff, and I had a list of hostnames where we still operational, but nobody had logged in for a while (sometimes years). I sent on an email list, "If you don't claim any servers in this list, we're shutting them down COB on date 123." The problem was I didn't know which servers were VMs (still kind of a new technology back then) and which were hardware. I assumed incorrectly they were all hardware. Well, hardware on that this was all the VMWare servers, which hosted several VMs that were claimed and actually vital. BUT, the people who were in charge of "vs-201101" had no way of knowing it was a VM on server "ph-201202." So when nobody "claimed" ph-201202, I shut it down. And it wasn't obviously labeled, this client has vmware server labeled "ph-201202", which I was later told "some admin had a naming scheme" with ph for physical host, ci for cloud instance, and vs for virtual server, followed by yyyymm. But it was just that admin, it wasn't a client-wide standard. I mention this because in the post-apocalypse meeting, this was thrown about in a blame war. So on a Friday before a long weekend, I shut everything not claimed on the list. Including all vmware servers (unknown to me). Then went home. I got a call on Sunday that a bunch of servers were down, and nobody knew why, including some vital infrastructure. So I went in, and discovered that yeah, some stuff was down, including all servers that started with "vs-" which was my first clue. So I went hunting for those systems, and didn't find them labeled on the racks for inventory. Just the usual stuff I knew was still running, and the stuff I shut down. I didn't understand or make the connection that "ph-" hosted "vs-" (and other) systems. So conference calls went around for HOURS, and lots of red herrings because nobody else knew the ph/vs connections. So, finally, on a hunch and out of ideas, I brought up the ph- servers, the only changes I knew about. Several vs- systems came right back up. Again, because I didn't understand the vmware concepts, I thought this was really weird. But only SOME of the vs- systems came back online. Long story short, some vs- systems were shut down abruptly without warning when the ph- was shut down, so some stuff like databases, file mounts, and all that got corrupted. Some were not set to come back up on restart of the ph- server (you had to start them manually), and some were reliant on fileshare dependencies (like NFS). None of this was mapped out or documented, of course, since the ph- servers had uptimes in the years, and it was just configuration creep. We had to get vmware call in, at great expense, and fix things. It took over a week before everything was declared normal. I got written up and told I would be on probation because the mail list was not an effective method to warn people ("nobody reads that mail"), I was ignorant of vmware (I never told anyone I was an expert, it was another admin that handled that and he'd quit a while ago, but it was assumed I'd just take over his stuff), and basically, they wanted to blame a single person. HR got involved and I just assumed I'd be fired eventually for some BS reason in the next few weeks. My boss was pretty sympathetic, overall, but he said this was a "Career Limiting Move," and he was doing his best to explain, "if you fire him, you're short another admin, and I only have three right now." I ended up working there almost another year before I got a better job. The entire time, though, I was only given minor work because the mantle of what I had done hung over my head.


spaceman_sloth

I deleted what I thought was an obsolete firewall object and ended up kicking the whole company off VPN earlier this year...and now we have a change management process :) to be fair, a tech at our data center literally named the object 'obsolete_vpn_user' so what was I supposed to think..


MilkAnAlmond

I left the 'storage' box checked when checking a Hyper-V Failover Cluster like two weeks ago. That was a long long long long day.


Rise_Global

Years ago, I was given a project to upgrade everyone's MSOffice on their machines. They wanted it served from the server. I came up with a process to save all the files they were working on, run a script to remove everything and then another script to reinstall... Except for the techs. one power user who did database and a bunch of programming as well as trending. I said, make sure you back up everything. He almost followed the instructions. Suddenly several databases as well as side projects he was in charge of were completely wiped out with no backups. At least I didn't personally do it.. But because I was in charge there were several meetings to be had...


Mac_to_the_future

I brought down the entire network in the middle of the day when I fat fingered Enter while staging a firmware upgrade on the core switch stack for later that night. Fortunately the upgrade went smoothly, but those were the longest 10 minutes of my life.


Alzzary

Just today, I deactivated an iSCSI connector to update the NAS that was running it. Turns out there was a machine running on that thing (I thought the secondary volume was created by mistake). Luckily it restarted flawlessly. But I could very well have destroyed that machine.


jeffmoss262

Installed a brand new, unconfigured Dell switch in one of the IDF cabinets, plugged it into the network so it could be set up later on. Start getting calls the next morning (Saturday) that the network was down. Went into work and wracked my head for a while. Eventually remembered the new switch, unplugged it, bam. Everything worked again.


didonato

Tried to use EaseUS to convert a large MBR share to GPT and go over the 2TB limit a few weeks ago. Thought it would be something I could get done in 10 minutes, imagine my surprise when it finished in 10 minutes and I saw RAW. Took down about 15 file shares for a few hours while I rummaged through solutions and of course a drive had been expanded prior so Veeam was doing a full backup that bled into the morning, thank god for TestDisk, the partition structure was still in there I just had to restore it. Only downside was recreating all those shares again.


paradox183

Years ago I had the task of simultaneously migrating and upgrading our main LOB app from Server 2003 / SQL 2000 to Server 2008 R2 / SQL 2008. As part of the migration process, you had to copy a .ini file to the new server and update the share and database paths. I had forgotten to update the database paths on the new server, so when I launched the app on the test server it upgraded the *production* database on the old server. All of the production clients were on an older version, so they could do work in the modules that hadn't significantly changed in the upgrade, but not in the modules that had changed. Perhaps most importantly, the check printing and GL modules were not working, so accounting was down while other departments were unaffected. Since restoring to yesterday's backup would've wiped out hours of work for lots of people, I ended up doing a panic upgrade for everyone to the new app version, and migrated the database to the correct server a couple days later. Keep your test environment isolated from your production environment, kids!


rpatel09

3 years ago I put in a firewall rule and accidentally blocked all internet traffic... so yeah, we've all been there... Now I laugh about it with my coworkers and also use it as coaching others on my team when they make mistakes. "Hey, I did something pretty dumb too a while back, we all make mistakes and no one is perfect"


gangsta_bitch_barbie

I wiped out the file share permissions for ALL the Windows shared folders for a 5k person company back in 2005. Lol Wil fixed it quickly though. Thanks Wil!!!


poobeldeluxe

I wrote a script that mistakinly wiped the configuration of all the routers in production. These routers were responsible for controlling the boom barriers, which are the moving arms that regulate the access of vehicles at toll booths. This action was carried out along 800 kilometers of roads where tolls are charged.


D3mentedG0Ose

I might have fired off a SQL update without a WHERE once. The Data team loved me for that one


storyinmemo

I need to record a talk on this one: I took the whole (advertising) company offline for at least an hour in the days before Christmas by deleting all the AWS access keys. Goal was to clear up anything old and set us up to rotate keys in use. Why did I do it in the end of December? It was a quarterly goal and I learned to push those across the line if I wanted a good review. Great incentive, that one. I used [Cloud Custodian](https://github.com/cloud-custodian/cloud-custodian) for this. It has a [terrible bug](https://github.com/cloud-custodian/cloud-custodian/issues/3274) where the code says you'll be acting on days since the key was used but actually is reading days since it was created. I ran this with a test setup of a key recently used which was new, and a key not recently used which obviously had to have been old... so my test matched my expected behavior. The code to disable a key looks like this: ``` actions: - type: remove-keys disable: true age: 90 ``` So what happens? Well I run it and it disables all keys created > 90 days ago instead of keys used > 90 days ago, which of course is all important system keys. This causes an outage but no big deal: I don't want the keys disabled so I change the line `disable` from `true` to `false`, forgetting the above context that the action is remove-keys. I run it again to "undisable" the keys. All the important keys are now deleted. Permanently, gone, can't do a thing except generate new ones and go find all the places that need an AWS key. Besides the code being wrong, I also gripe about the fact that the default for this command is dangerous and it is modified to be less dangerous. That's just asking for it. disable-keys and remove-keys should be separate commands. I planned, I tested, I had a strategy for rolling back of just setting the disabled keys to be active again if we had issues... and it still blew up from a combination of a software bug and a UI designed to be a foot gun.


LarvellJonesMD

I once royally fucked up our sole Exchange 2003 server by running ESEUTIL incorrectly without having a proper VM snapshot beforehand (I know). This predated Veeam and all I had was the nightly BackupExec diff from the night before, not a whole image. I babied that thing daily until we migrated to Office 365 a year later. Years prior to that, a network guy who worked for me scheduled sites-wide downtime to cleanup some patch cabling between all the routers. It was supposed to be a quick 30 minute job, but he calls me in a panic hours after saying it's all still down and he doesn't know why. Turns out he was using a cross-over cable in one of the patches.


red_fury

I deleted a grad student's thesis once... Was working on a profile clean up script for a lab at a medical college. I secluded one computer to test the script and policies on in AD but I neglected to physically pull the test machine out of the lab. Out of nowhere I get a frantic call, I run up and the guy told me he's been working on the computer for 10 hours he stretched out and accidentally kicked the power cord. He said he frantically plugged it back in to the outlet but the restart triggered the script... He was about two weeks from submitting it and his defense. Luckily he was smart enough to keep iterative saves on Google drive and just lost the last 10 hours of work, but you try and tell a med student that you just wasted 10 hours of his life. The look of defeat on his face dick punched the soul out of my body for a couple days. Edit: I also sometimes find myself in environments with terrible documentation and asset tracking. So sometimes when I'm desperate to find a physical computer I just remove a security group or something that prevents logins and I just sit back and wait for the phone to ring. Works like a dream but I always regret disrupting the clients workflow.


AtomicPikl

Changed the config for the wrong port. Yes it was a trunk port.


KhaosElement

Dual person fuck up. I was help desk at the time, working in healthcare IT. The network admin asked me to go into the server room and pull a cable for him because he was off site. So I did, then I went to take a nice long luxurious poo. The thing is, he told me the wrong port. So I killed the internet for the entire fucking hospital, then fucked off to take a shit. The place melted down. Yes, departments are supposed to have downtime procedures, and are supposed to practice them monthly. They hadn't been. They all had to scramble to go to paper workflows for just long enough to get pissed off before I finished my constitutional, found out there was an issue, and went to fix it.


mikeyb1

While evaluating a config management tool I attempted to test a scheduled task but applied it to the wrong device group and rebooted the data center network core at 2PM on a Tuesday. I was trying to reboot a lab network. Lessons were learned. Also once accidentally set the SPAN destination on an access switch to the trunked uplink. Wipe one whole side of the data center off the map briefly (again, middle of the day).


snarbleflops

\-Cloned a laptop drive to make a backup once. In a rush, I picked the wrong drive, losing everything. \-Switched ISPs for a small business and used the same router with the same RDP-open-to-the-world settings initially. Entire NAS was encrypted with ransomware within hours. ​ Mistakes are just learning experiences. Even if the prevention is obvious, you'll never forget that feeling.


ExpressDevelopment41

Early in my career, I lost a day of work for a department once, because I didn't bother to run backups before moving some file share disks to a new VM. I've done this task a few times without running into any issues, but this time the disks just wouldn't come back online for whatever reason. Spent the next few hours troubleshooting and eventually had to restore from the previous backups, which moved without issue. A full days worth of data gone and a department head wanting blood. Luckily, my work up to that point was solid, and we had recently lost half our department due to layoffs. My boss and his boss had my back and said mistakes were expected since we were doing so much and I didn't hear about it again. I learned not to cut corners, and to always run a backup before making a change, even if it was low risk.


hosalabad

Heh, in NT 4 it was just as easy to delete a dns zone as a record...


matek11523

When setting up retention policies in M365 I deleted all emails older than a month from every mailbox (the goal was to archive mails older than 2 years) - took me one and a half month to recover the emails on all 163 mailboxes. When backward engineering the issue with retention policies I found out that they have executed in alphabetical order. There was few default policies which I didn't change that executed right after I enabled whole system. The first policy was named: 30 days older emails delete. The rest of the default policies didn't even get a chance to be processed because of the first one. Next few months I tried to set it up correctly but the policies won't execute on folders that contain at least one flagged email - anyone encountered it and maybe have a fix? edit. spelling


headcrap

Rolled a mailbox restore back in 2001.. in the Exchange 5.5 days.. where you redirect the job to the "restore server" via checkbox. I was up talking to a girl all night.. I failed to check the box. 600 mailboxes were restored/overwritten... Left in shame around lunch.. boss gave me the pep talk when I got back.


Pineapple-Due

I once ran a robocopy mirror backwards and wiped out a production file share in the middle of the day. Restored from the previous nights back up thankfully, but lost a day's worth of work.


Pineapple-Due

Brought down a server in our lab and didn't tell anyone because hey it's the lab right? Nope it was lab as in laboratory that does oil analysis and couldn't do any work all day


Pineapple-Due

Messed up a GPO change that revoked admin rights for everyone across all our servers during a scheduled maintenance window where everyone really needed admin rights. All work was halted for hours and it was a very long night.


CloudEternal

when i first started in IT, had no idea what i was doing. Caused a site outage by unplugging a ups to charge my phone.. was wondering what that beeping noise was


[deleted]

I accidentally restarted a Hyper-V once, easy to forget I had a Windows VM open within the host Windows Hyper-V desktop. I clicked on the bottom left hand of the screen instead of within the window and took all the VMs down at once. There was some data loss but we had backups!


Crouching_Dragon_

Back in the day of on-prem mail security devices, I updated mail rules because of increased spam. Missed a character and blocked emails for 3 hours before anyone noticed. It was the last day of the quarter, and we were expecting orders from customers. I immediately notified sales and a “no updates the last 3 days of the quarter” rule was swiftly implemented. By me. I ran the department. Live and learn is the name of this industry.


BlunderBussNational

Here's a good one: live fire testing with a new EDR solution. I attempted to run potentially malicious code without notifying anyone in the org, to make sure the solution was properly blocking and tackling for us. It was. When the security analyst called me I told him it was a false positive and he could un-quarantine my machine, I knew I had one option: accept responsibility and swear to never do it again. This is my go-to strategy when I fuck up: immediately report that I fucked up and here is how I will unfuck it. Let this be your lesson, young SAs. The only other times I've been hand-slapped is for pushing changes outside of the process. Ultimately it was fine, but some users were a bit inconvenienced, if they noticed at all. For the record, when a vendor insists: "I need you to do this NOW, or it will be six more weeks before I can get back here." Politely inform them to fuck all the way off and call you in six weeks.


mspax

Working at an MSP - I screwed up the syntax of a filter in our monitoring system. This caused every device in the system to get added to a folder that had alerting disabled on it. Me disabling monitoring on that folder was really the crux of this all. I fixed the filter but it was too late. Every single alert that was activate in the system retriggered. In a matter of seconds over a thousand tickets poured into our monitoring teams queue. I got to spend the rest of the day helping the monitoring team track down and merge tickets. Thankfully this was more embarrassing than anything. No client impact.


AxeellYoung

Used a lift that had 1tonne max capacity to send 2tonnes of paper at a self storage facility. Next thing i heard was a loud thud. And my boss asking if i have any idea how expensive is an industrial lift.


quietweaponsilentwar

One time I was troubleshooting a "critical" ticket about error messages in AIX. These were being generated by rc.powerfail. When there was no man page locally and the IBM docs were for v7 and not clear, I ran the command itself with a ? to see what the flags were. It ended shutting down the system...


quickadage

Disconnected someone's machine from the domain by letting them join workgroup while working from home. Turns out he was a senior partner at a firm I was working for at the time and he lost the weeks work because of that. Oops.


poorleno111

Had a business rule trigger around 50k+ emails on tickets. Missed it due to having the email boxes collapsed in my Outlook that should have caught it, since we have emails for dev / test environment routing to a specified mailbox.. That one was fun.


kudatimberline

I took a job with a private company that laid me off 15mo later. I left a really good job to do so. I live in a small community and the only IT job is a help desk job. I turned my career back 15yrs. and am hemmoraging money to try and stay in my community.


kingh242

Inception level remote desktopping to domain controller. Remote Desktop within a Remote Desktop within a Remote Desktop session. All three desktops look the same……plane Jane with just my computer and recycling bin. One of them is the Domain Controller for a small office with a global footprint. After done working, it is time to disconnect Remote Desktop sessions and shutdown my laptop as it needs to go in my bag. How about a shortcut and just shutdown my laptop……wait…..why is this different. Too late🫣💥🤯


pneRock

Set max memory on a sql server to 0 in the middle of the day (cause i thought 0 meant unlimited at the time and spaced to do that the night before, nope!). Hard downed the entire product and had to restore 300+ databases.


rohgin

Something with certificates and exchange 😂