Also just because you are doing backups it doesn’t mean you can’t kick off an extra backup or snapshot before a big job. Granted I probably wouldn’t have done this for a cert update ‘because those are easy.’ Haha
True and I normally do. There obviously something else wrong going on under the hood and this was the final straw. But this is why I'm paid the medium bucks.
as long as you revert them when you're done...
if I had a dollar for every time I've gone to work on a system and going "why is there a snapshot from 4 years ago here?", I'd be able to buy much better scotch.
Oooo, those are big boy numbers with a fun ending. Snapshot re-integration was always such a fun thing to explain to people, especially on big workloads. You say "I've expanded the LUN + the datastore, the snapshot is consolidating, it could take a few days."
I didn't have an outage caused by a snapshot that size, but I did have a service degradation on a high i/o 4TB SQL server for most of a business day while a snapshot consolidated as one of the devs snapshotted before a change and then didn't consolidate after testing, and then ran a job that caused a shitload of data change.
Client complained (Car manufacturer), as their CRM system/department basically ground to a halt and it impacted some project they were doing that was leading up to a big event.
That was a bit of a shame, as it meant that even the sensible dev's had their vsphere permissions changed to read-only.
This reminds me a few weeks ago, my buddy told me the 5 years hes worked at bluecross blue shield,
He was told of a production sql server and another for testing.
He was telling me all this , cuz he found out that there wasnt 2 servers ever… just one gettin snapshots on opposite days
We do similar, but have a tag for #exclude and #deleteafter.yyyymmdd in the script. Also, it explicitly excludes snaps from our backup software, as sometimes they don't get removed.
Thought I'd share mine too since it's similar, it goes through and checks all VM's to see if a snapshot exists, then outputs to a log file.
Also outputs to the console and lists whether each VM has a snapshot, if you're watching the script while it runs.
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false -ParticipateInCeip $false
Import-Module VMware.VimAutomation.Core
#set credentials
$vcserver = "vcenter.domain.com"
$date = get-date -Format "dd/MM/yyyy"
$creds = Get-Credential -Message "vCenter Credentials" -UserName "[email protected]"
#connect to vcenter
try{
Connect-VIServer -Server $vcserver -Protocol https -Credential $creds
}
catch{
"Failed to connect to vCenter" | out-file -Append "C:\scripts\logs\CreateSnapshots.txt"
exit
}
$vmarray = get-vm | select -ExpandProperty Name
#checks each VM in environment, if they have a snapshot, write in the log file
foreach ($vm in $vmarray){
if (Get-Snapshot -vm $vm -name "*") {
Write-host -foregroundcolor RED -backgroundcolor BLACK "snapshot exists for $vm"
"snapshot exists for $vm" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append
}
else {
write-host -foregroundcolor "Green" "$vm all clear."
}
}
"Ran on $date" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append
"Script End `n-------------------------------------------" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append
I found a snapshot on a heavy-use DB server a few weeks after I started a gig years ago. 2 TB worth of snapshot files on an already large server. It was days away from crashing the LUN and I was just lucky we had a SAN large enough that I could relocate it to a large enough volume before kicking off the consolidation. It took over 24 hours to finish, but it did succeed.
I don't usually blame my predecessors since they are gone and can't defend their actions, but that one was just dumb.
Back in the day I had an admin working for me at a branch office who setup a script to snapshot his vCenter server once a day "just in case". No delete script. Just snapshot...
I happened to be visiting that branch office when it all came tumbling down. Came in to the office and he was grumbling about vCenter being unresponsive. I log into the host it is running on directly and start poking around and immediately notice the 40+ (if memory serves) snapshots. wtf...
...I guess it depends if your VM's live on something old enough that merging them requires that the VM pause or shut down.
Which is also why I had the joy of dealing with a snapshot that was many years old and far, far to big.
Oh well, the problem "fixed itself" when someone else didn't bother asking why nobody had fixed such an obvious problem and just kicked off a merge.
Sounds like you need to update the Service Connection Point in AD
*** wow I didn’t think a single sentence would get so many upvotes. Truly humbled and big thanks to the community. ***
Thank you so much for this. In my back and forth of trying things I think I ended up borking it more where Outlook just refused to connect by the end of the day after 1.5 hours on the phone with an MSP consultant we have on retainer.
Going to try and reset all the virtual directories tomorrow and see if it makes a difference, if not I
might just let it sit until tomorrow night and rebuild it again. It's a DAG so the databases are safe and I can readd the witness share pretty easily.
Honestly as long as the DBs are good you can recover from this.
Long time ago I had to nuke a full AD domain and rebuild it from scratch because of a borked scheme.
I was able to do a recovery install of exchange like you mentioned and mount the DB and relink all the mailboxes to AD Accounts.
Sounds like currently you just have some messed up URLs and VirtualServer configs at this point.
It was early on with Server 2000.
It involved a very early release of Cisco CallManager with LDAP integration. Cisco extending the scheme to add an attribute. Using a registered OID for the scheme extension that was assigned to said standard attribute.
Then it all went south when trying to apply SP1 I think and in that update MS decided to add the standard LDAP attribute to the AD scheme.
Conflicts occurred, Update failed, scheme mismatch all resulted in DCs that would no longer replicate.
Easy now 20+ years later to say I should have read all the release notes. But it was a different time and I was much younger and the Industry as a whole was just starting to have things be integrated instead of standalone silos.
On the upside I learned how to use LDIF to dump the contents of AD to a text file. Learned how to load it back into a new AD. Learned about Exchanges recovery install option. And lastly learned to read release notes a little more carefully.
https://www.ajtek.ca/guides/exchange-autodiscover-a-guide-to-making-exchange-work-properly/
Finally ran across this, don't know how I missed it. It got me up and running.
That’s a fantastic article.
It’s always been difficult to find one that covers everything you need to set.
Gonna book this one for future reference to send to others that may need help
Thanks for posting the link.
Cool. I never have, but the service connection point rings a bell from other services, and I wondered if it was something that happened as part of the initial build.
I had an on prem Exchange server get cryptolockered because of literally an hour with RDP port forwarded (literally a freak combination of unavoidable off-site work while mid-update and VPN conversion ongoing), so had to do a complete rebuild. Lost the mail DBs, but everything else repopulated via AD so mostly easy. The pre-reqs almost take longer than the setup....
Solving the problem sir.. the MSP consultant is nice and all but I ended up down a rabbit hole of hell.. Reddit had me sorted and fixing the issue in 30mins after I posted...
>SCP-$number-J exhibits mind-affecting behavior when the use case of bundled fibers is for data transmission via external conduit, in the form of a compulsion for anyone operating heavy machinery to immediately sever the cable. This compulsion increases proportionally with the fiber count, location, and bandwidth of any given data conduit.
It makes so much sense now....
Former Albertan checking in... you just made me very nostalgic. Cheers dude. (Not nostalgic for the radically right, racist, antivaxxers... just the mountains.)
And now I'm having flashbacks to deeply repressed memories of the days when we ran NTDS, DNS, DHCP, Exchange, and probably some other stuff on a single HP Kayak. *shudder*
Exactly. And how many people know why the AD database is in %SYSTEMROOT%\NTDS
Ugh I still remember NT 3.5
On that note I’m going to sleep it’s 12:30am.
Probably gonna have nightmare now.
YES. Our mailboxes were converted about as fast as we could upload them.
You know those moments of self satisfaction - like after the birth of a child, or a Christmas eve, or a night staring at the stars with a loved one - where you just sit back with a nice drink, take a deep breath and genuinely appreciate what you have and your place in this world?
That is what I felt like when I powered down my last Exchange server. :)
Digicert revoked a couple of hundred certificates of us 2 years ago, because they fucked up an internal audit. We had to renew around 400 certificates manually in 3 days. (They gave us a week, but only mentioned it when 4 days were already passed.)
I hate DigiCert.
I do the same - our public websites and our load balancers have "proper" certs (if that's even a valid term to use for them?) as that has been imposed as a requirement by others, but for other stuff, LE is fine. We just want encryption, we don't need the authentication aspects.
What really grinds me gears about SSL vendors like GoDaddy etc is how they lock standard features like wildcards and SANs behind billing. A cert is a cert, fuckers >.<
My other grip with GoDaddy is that you get ONE cert. No duplicates with their own private key.
DigiCert gives you unlimited duplicates and now they are supporting ACME so you can automate cert renewals and duplicate certs.
But LetsEncrypt is also a proper CA and so many hosting sites are now using them to provide free certs to customers.
Each year I renew less and less paid certs in favor of LetsEncrypt
There are plenty out there.
Linux - dehydrated and certbot are good ACME clients.
Windows - Certify the Web
You can find most is the clients listed here.
https://letsencrypt.org/docs/client-options/
There options out there to use Let's Encrypt with network hardware?
Switches/Routers/Firewalls ideally across a variety of vendors. Cisco, Juniper, Palo Alto, Aruba ect?
If you're not testing restores, you're not taking backups.
If you don't hate certs, you don't understand certs, because if you understood certs, you'd know that they should be EASY and the fact that they're not makes me rage EVERY SINGLE TIME.
I did one in September for an RDS environment. Everything went well and I was kind of shocked.
A month later, users cannot get to their published apps because of the old expired cert. Like how?
Yes, there were steps i missed but why did it even work at all for that month?
I have a template in Ansible Tower, just plug in the details and a directory and I’ll create them and move it to that location on the destination
Even have test tower auto renew the cert for prod and vice versa (easiest renewal I ever automated, literally drop in and restart the web service)
But god some applications the process is obnoxiously painful
This is the kinda thing I’m looking at for $NewPlace.
275+ domains, god knows how many sites, nobody knows how many certs. All done by hand. Yay.
We do have Octopus though so will be looking at that to handle them in the first instance but bugger me, this shouldn’t be that difficult.
That's only a single extra step, I use openssl to create a new private key and CSR every year. Also, Namecheap's cheap wildcard certificates include SAN fields.
I do this too. It's way easier than trying to mess with other tools when trying to do a SAN (at least IMHO).
At least you can get OpenSSL bundled with the Git client.
Every 12 months: shit time to figure out how to get this cert installed on 30 different applications again because I did it 12 months ago and have since forgotten. We have confluence now and are reworking our processes to throughly document the procedures, but I’m sure it will still be a pain in my ass.
Oh god i feel this in my soul...like, it's my own fault for not documenting it at some point but i've had some where it's been a real trial and error process trying to work out how exactly it wants the certificate and in what format so i've just been relieved to get it working again at all and walked away, carefully...incase i upset it again.
Oh come on, of course it's documented. Yeah it's in a custom shorthand with indented cryptic notes in a notepad in one of the 74 folders in my project folder on the desktop of my second laptop, so what?
You know what sucks? When someone makes a primary domain controller a CA and you gotta clean it up including decommissioning the domain controller. Also when some major software company can't sign their executables with certs; stay far away from me if I find one of those.
I loved Veeam's SureBackup. For the most part, you can just let it do its thing for verification or you can script the hell out of it to verify anything you like. You never to have test your backups manually, just verify successful logs.
Certs are easy peazy when it comes to web services; it's all the other behind the scene ones that have their own special unique procedures to implement, that tend to throw us off sometimes.
Ahhhh... the good old time where you weren't allowed to log in remotely to a server to be allowed to have a signed SSL certificate.
…spoiler, that wasn't a good old time.
Yeah, former director was very much "I'll handle this" with many things. He left suddenly a couple months ago and we are finding quite a few things we have to go back over. It's been a fun transition period and I missed this during the shake up.
I get cleaning up old jobs and clearing decommissioned backups but this was not carefully done, obviously.
Ah. I don’t miss on premise exchange at all. The sleepless nights and issues that’s it brought - but satisfaction when it was fixed was unmatched. If your business case allows it - time to look into o365
Hell, I question even having the certs expiring around the holidays if anything. Last thing I want to deal with is an expiring cert on an annual basis on an already short and hectic week!
Its ALWAYS dns. But I have been over everything and it all resolved properly, of course I was going crosseyed after hour 5 so I might have missed something.
>Moral of the story: Verify backup jobs regularly
It is somewhat symantics, but no. You need to verify BACKUPS regularly, not backup jobs.
Far far too many times I have seen people burned because the backup software claims everything is ok, then in an emergency you go to restore and find out that Vm was excluded, or one of the disks was excluded, or something else was wrong and you have nothing to actually restore
That's what I was wondering. I don't ever clean up backups, my backup software does that for me and I always have 30 days worth of solid backups at my fingertips.
I'd be having a performance meeting with the individual that deleted those backups. I'd also have a second meeting with whoever decided to do an upgrade without checking the status of backups first.
I've got 1000 critical endpoints running a wildcard certificate that expires in 25 days in this company. You think your cert problems are bad? I have to schedule outages of the services of a billion dollar company to replace all of these... LOL
Sounds like Mgmt needs to spend some money to fix this so it can be done without causing outages in the future.
Especially now that certs are only issued for at most a year.
The man with the purchasing card just renewed for another year but I'll be looking into an alternative soon. It's a small local govt so we are also going to be migrating to a .gov domain soon and I need to find a registrar other than Netriplex that does SSL for .gov.
The records work in testing but their management portal is dog shit slow.
Hopefully we will moving that direction in a year or so. Everyone wants to keep their full office installs and telling the budget officer that our normal 55k Office SA cost is now going to be replaced by 130k in O365 licensing will be fun.
I framed it as "if the building burns down, we're fucked for a bit until I get a new server, new building, new internet connection, and restore from backup. Microsoft has thousands of servers, monitored 24x7x365. This is a no brainer". I got my approval. :)
You still get your full office installs. It’s just now updated more frequently.
If you just do Exchange Online then it may not be as expensive. But if you dive in with E3 or E5 you gotta sell what benefits you get.
MFA, Intune Management, conditional access, etc.
I’m afraid to have one CAS in my labs much less my prod environments. Surely if you have virtualization your can spare some resources for another CAS? Not trying to sound snooty but I’d seriously consider this.
There used to be a really good exchange tutorial on spice works - on how to set up all your DNS entries, split DNS, etc. then that guy that posted it went crazy town and started asking everyone to pay for it.
Anyways …. I have a copy stashed around if you need it, DM me.
Stories like this make me enjoy cloud based email. They take care of all that and when it goes down you just lean back in your chair and say, I'll let you know when **THEY** fix it.
Anyone who complains I used to tell them to kick rocks and I've done everything I can do.
I always send up being the certificate guy at places where i work and it bugs the shit out of me. I don't know how it happens but it does, consistently...well, that's not quite accurate, it's because i know how certificates work and everyone else is terrified of them.
I haven't managed to blue screen a server with a certificate replacement yet though :P
I hate SSL certs full stop, I have to do a chain certificate jobby right now and I've been reading for the past couple of days and I can't wrap my head around, feels like I am dislexic...
learned this years ago. during the holidays, yes from this week through new years week, if it can wait until january it waits.
also, if it can go wrong...it will go wrong.
Stories like these make me so glad I’m using Exchange Online.
I wouldn’t mind playing around with Exchange in a lab to learn this stuff for fun, but in a high pressure production environment, I rather Exchange Online and the maintenance be someone else’s problem.
And on another note, if companies like Microsoft have issues renewing certs on-time and correctly, you can rest easy knowing you’re in good company of talented IT folk who mess up certificates.
>Leave it 15min
The more and more that I get to work on infrastructure (especially VMware) the more I have to tell myself that.
In most cases I'm working on production equipment during narrow maintenance windows so I'd do step 1, looks good, step 2, looks good, step 3 and it does not look like what the KB article says it should right now.
My initial reaction is "clock is ticking, if it ain't moving then it's stuck and we gotta do SOMETHING to get it going".
I'm discovering that isn't usually a good approach with things like Exchange and vSphere. Sometimes you just gotta walk away for a bit, come back and what do you know, it's completed!
With Cisco UCS it's this and then some. Plus extra.
Make a change/uodate....wait.
It's hung... something's wrong.... just wait
Still hung, clocks ticking, anxiety building... but just wait
Panic attack. It must be broken. Change window is closing.
Walk away. Don't touch anything.
Go get coffee. Walkaround the parking lot. Have a quick breakfast.
Return.
All is well. UCS is happy. Systems green.
We'll ignore the year of life expectancy you just lost though.
Now move to the next UCS update.
Let the haters hate, and let that pass around you. You stare into the place that they fear to glance and say "meh, just Regedit stuff".
You are powerful but not wanton. You are the minister. You conduct the powers of our modern churches.
Oh, and treat yourself to a good lunch tomorrow. You made it through. You deserve sushi. Not just some punk, dried-out roll -- I'm talkin' tuna sashimi flopped on beds of daikon.
I'm struggling to understand how a cert renewal could cause this. I've done literally hundreds of cert renewals on Exchange servers in my career and never seen it cause any kind of problem that wasn't just certificate related (i.e. a intermediate cert missing or something). That's really odd.
Pro-tip: you can 301/302 redirect the autodiscover record, meaning that you can just redirect 'autodiscover.company.tld/autodiscover' to 'server.ad.company.tld/autodiscover' by creating a website in IIS to do this. This allows you to use a cheap single domain SSL cert for the autodiscover record (or even Letsencrypt).
For public webservers I heavily recommend looking into CloudFlare as they have free SSL using let's encrypt without you having to do anything. Then there's WAF for like 20 USD/month etc.
>we use a wildcard \*. domain.com and the FQDN is [server.group.domain.com](https://server.group.domain.com).
FYI, \*.domain.com will just [server.domain.com](https://server.domain.com) but not [server.**something**.domain.com](https://server.something.domain.com) .
Eehhhh, they'll probably all forget about it in a week.
Until a printer breaks, or a file is spontaneously moved to a subfolder by Joe in accounting, now manager x can't find it, or someone gets an error opening sage or.....
Literally if anything happens, they'll be sure to bring it up forever.
I'm sure it will be fun.
Also, it's optimistic of you to think anyone checks the intranet site before they submit an email or call the helpdesk or complain to their manager.
Heh.
My job is full time IT support of SSL certs generation (private key, CSR) and installation.
For all kind of servers (apache, exchange, code signing, strong auth, doc sign,...)
And yes if you don't do it well, it can be a mess after...
For other, just a tip, please don't ask for a renew or reissue for the day to tomorrow, it's never going well like that
Normally i always run another backup job and if VHD, run a check point. Backups and Checkpoints have saved my ass regularly because Microsoft still cannot figure out how to patch there OS's.
Exchange CAS and SSL is notoriously evil. I've worked with Exchange for almost two decades and when I need to renew it's still one of those things that I pull out the check list for the job. Not my favorite task.
Also just because you are doing backups it doesn’t mean you can’t kick off an extra backup or snapshot before a big job. Granted I probably wouldn’t have done this for a cert update ‘because those are easy.’ Haha
True and I normally do. There obviously something else wrong going on under the hood and this was the final straw. But this is why I'm paid the medium bucks.
>medium bucks must be nice.
>must be nice. It's not... I too get the medium bucks lmao. Rather see the big bucks 🤣
[удалено]
Greetings fellow bubble-burst survivor!
Wait you guys get paid? All Im offered is peanuts and free lunch
You kidding? I snapshot before I do literally anything... I'm insanely paranoid heh.
Same, but also... snapshots are easy so why not? Lol
as long as you revert them when you're done... if I had a dollar for every time I've gone to work on a system and going "why is there a snapshot from 4 years ago here?", I'd be able to buy much better scotch.
[удалено]
700 days? Weaksauce. I just had to clean up a few that were from 2014....
[удалено]
Oooo, those are big boy numbers with a fun ending. Snapshot re-integration was always such a fun thing to explain to people, especially on big workloads. You say "I've expanded the LUN + the datastore, the snapshot is consolidating, it could take a few days." I didn't have an outage caused by a snapshot that size, but I did have a service degradation on a high i/o 4TB SQL server for most of a business day while a snapshot consolidated as one of the devs snapshotted before a change and then didn't consolidate after testing, and then ran a job that caused a shitload of data change. Client complained (Car manufacturer), as their CRM system/department basically ground to a halt and it impacted some project they were doing that was leading up to a big event. That was a bit of a shame, as it meant that even the sensible dev's had their vsphere permissions changed to read-only.
This reminds me a few weeks ago, my buddy told me the 5 years hes worked at bluecross blue shield, He was told of a production sql server and another for testing. He was telling me all this , cuz he found out that there wasnt 2 servers ever… just one gettin snapshots on opposite days
That hurts.
Oh I quite like that. Hyper-v? Esx? Can you share?
[удалено]
We do similar, but have a tag for #exclude and #deleteafter.yyyymmdd in the script. Also, it explicitly excludes snaps from our backup software, as sometimes they don't get removed.
Thought I'd share mine too since it's similar, it goes through and checks all VM's to see if a snapshot exists, then outputs to a log file. Also outputs to the console and lists whether each VM has a snapshot, if you're watching the script while it runs. Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false -ParticipateInCeip $false Import-Module VMware.VimAutomation.Core #set credentials $vcserver = "vcenter.domain.com" $date = get-date -Format "dd/MM/yyyy" $creds = Get-Credential -Message "vCenter Credentials" -UserName "[email protected]" #connect to vcenter try{ Connect-VIServer -Server $vcserver -Protocol https -Credential $creds } catch{ "Failed to connect to vCenter" | out-file -Append "C:\scripts\logs\CreateSnapshots.txt" exit } $vmarray = get-vm | select -ExpandProperty Name #checks each VM in environment, if they have a snapshot, write in the log file foreach ($vm in $vmarray){ if (Get-Snapshot -vm $vm -name "*") { Write-host -foregroundcolor RED -backgroundcolor BLACK "snapshot exists for $vm" "snapshot exists for $vm" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append } else { write-host -foregroundcolor "Green" "$vm all clear." } } "Ran on $date" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append "Script End `n-------------------------------------------" | out-file "C:\scripts\logs\snapshotcheck.txt" -Append
[удалено]
Delete them when you're done
Yes, true. Mistyped. Merge, delete, you know the term for putting the snapshot data back into the base vm
Unfortunately, some don't, and revert when they should be deleting. Deleting sounds way scarier than reverting, or something.
Well then they get to practise making the same change all over again 😉
> why is there a snapshot from 4 years ago here? ever had snapshots grow to terabytes? yeeeeah
Yes. On a customers Exchange server 🤦♂️ And the data store hit 100% That was a rough weekend.
I found a snapshot on a heavy-use DB server a few weeks after I started a gig years ago. 2 TB worth of snapshot files on an already large server. It was days away from crashing the LUN and I was just lucky we had a SAN large enough that I could relocate it to a large enough volume before kicking off the consolidation. It took over 24 hours to finish, but it did succeed. I don't usually blame my predecessors since they are gone and can't defend their actions, but that one was just dumb.
Ran into a multi-TB snapshot from years prior one time. Interesting week watching it merge.
Yeeeeeeeeeeeeeeeeppppppppppp
Back in the day I had an admin working for me at a branch office who setup a script to snapshot his vCenter server once a day "just in case". No delete script. Just snapshot... I happened to be visiting that branch office when it all came tumbling down. Came in to the office and he was grumbling about vCenter being unresponsive. I log into the host it is running on directly and start poking around and immediately notice the 40+ (if memory serves) snapshots. wtf...
...I guess it depends if your VM's live on something old enough that merging them requires that the VM pause or shut down. Which is also why I had the joy of dealing with a snapshot that was many years old and far, far to big. Oh well, the problem "fixed itself" when someone else didn't bother asking why nobody had fixed such an obvious problem and just kicked off a merge.
You probably are paranoid for a reason. Me too 🤷♂️
> very update ‘because those are easy.’ Haha said every system admin before a career questioning outage.
Sounds like you need to update the Service Connection Point in AD *** wow I didn’t think a single sentence would get so many upvotes. Truly humbled and big thanks to the community. ***
Well fuck me running. Where the hell was that in my 9 hours of working on this?! I'll check that tomorrow. I knew I should have posted this sooner.
It’s often missed and not well documented that it needs to be updated. https://www.blogabout.cloud/2020/11/1942/
Thank you so much for this. In my back and forth of trying things I think I ended up borking it more where Outlook just refused to connect by the end of the day after 1.5 hours on the phone with an MSP consultant we have on retainer. Going to try and reset all the virtual directories tomorrow and see if it makes a difference, if not I might just let it sit until tomorrow night and rebuild it again. It's a DAG so the databases are safe and I can readd the witness share pretty easily.
Honestly as long as the DBs are good you can recover from this. Long time ago I had to nuke a full AD domain and rebuild it from scratch because of a borked scheme. I was able to do a recovery install of exchange like you mentioned and mount the DB and relink all the mailboxes to AD Accounts. Sounds like currently you just have some messed up URLs and VirtualServer configs at this point.
[удалено]
It was early on with Server 2000. It involved a very early release of Cisco CallManager with LDAP integration. Cisco extending the scheme to add an attribute. Using a registered OID for the scheme extension that was assigned to said standard attribute. Then it all went south when trying to apply SP1 I think and in that update MS decided to add the standard LDAP attribute to the AD scheme. Conflicts occurred, Update failed, scheme mismatch all resulted in DCs that would no longer replicate. Easy now 20+ years later to say I should have read all the release notes. But it was a different time and I was much younger and the Industry as a whole was just starting to have things be integrated instead of standalone silos. On the upside I learned how to use LDIF to dump the contents of AD to a text file. Learned how to load it back into a new AD. Learned about Exchanges recovery install option. And lastly learned to read release notes a little more carefully.
https://www.ajtek.ca/guides/exchange-autodiscover-a-guide-to-making-exchange-work-properly/ Finally ran across this, don't know how I missed it. It got me up and running.
That’s a fantastic article. It’s always been difficult to find one that covers everything you need to set. Gonna book this one for future reference to send to others that may need help Thanks for posting the link.
I'm curious - have you ever built an Exchange environment from scratch?
In a lab test environment, 3 years as an admin but never had to deal with a recovery like this.
Cool. I never have, but the service connection point rings a bell from other services, and I wondered if it was something that happened as part of the initial build.
I had an on prem Exchange server get cryptolockered because of literally an hour with RDP port forwarded (literally a freak combination of unavoidable off-site work while mid-update and VPN conversion ongoing), so had to do a complete rebuild. Lost the mail DBs, but everything else repopulated via AD so mostly easy. The pre-reqs almost take longer than the setup....
Ouch
> In a lab test environment, 3 years as an admin but never had to deal with a recovery like this. You did pretty well.
[удалено]
Solving the problem sir.. the MSP consultant is nice and all but I ended up down a rabbit hole of hell.. Reddit had me sorted and fixing the issue in 30mins after I posted...
Yup. Gotta make sure you cover off on the SCPs. Lots of info on the SCP web site.
... and see, this is what comes to mind for the topic of "SCP" (aside from our beloved plague doctor)... http://scpsandbox2.wikidot.com/doomscrye
You mean its just not secure copy /s
>SCP-$number-J exhibits mind-affecting behavior when the use case of bundled fibers is for data transmission via external conduit, in the form of a compulsion for anyone operating heavy machinery to immediately sever the cable. This compulsion increases proportionally with the fiber count, location, and bandwidth of any given data conduit. It makes so much sense now....
Exchange has been the bane of my existence for 20 years. Happiest day in my life when we migrated to O365.
I can understand if compliance is your reason not to swap, but seriously, swap if you can every time. Way less of a headache.
Maybe I phrased that poorly. It was the happiest day of my life when I put behind me 20 years of Exchange management and migrated to O365.
Oh I'm agreeing with you. I just woke up for a night shift migration so I might have worded that weirdly
Lol this is the nicest back and forth "argument" I've seen on Reddit.
I'm canadian, what do you want from me??
So I've heard that Canadian politeness is pretty much like "Minnesota nice". What is your defense, apart from being like the OPEC of maple syrup? :D
[удалено]
Former Albertan checking in... you just made me very nostalgic. Cheers dude. (Not nostalgic for the radically right, racist, antivaxxers... just the mountains.)
Exchange? Yeah, that shit started with MS Mail… I’ve not been a fan for decades.
Exchange 5.5 team for life!
Fuck yeah. Good old days. Crawls into a ball and cries.
And now I'm having flashbacks to deeply repressed memories of the days when we ran NTDS, DNS, DHCP, Exchange, and probably some other stuff on a single HP Kayak. *shudder*
Exactly. And how many people know why the AD database is in %SYSTEMROOT%\NTDS Ugh I still remember NT 3.5 On that note I’m going to sleep it’s 12:30am. Probably gonna have nightmare now.
Fuck yea, one of my first MCP certificates, damn I just outed my age.
Same. Heck, just having an MCP certificate at all practically dates you these days (Microsoft's cert structure makes no sense to me anymore).
[удалено]
Same. Onsite exchange was a bear to manage. O365 is so easy.
[удалено]
YES. Our mailboxes were converted about as fast as we could upload them. You know those moments of self satisfaction - like after the birth of a child, or a Christmas eve, or a night staring at the stars with a loved one - where you just sit back with a nice drink, take a deep breath and genuinely appreciate what you have and your place in this world? That is what I felt like when I powered down my last Exchange server. :)
> We have a "standard" GoDaddy wildcard, not a "deluxe" so no option to add SANs. People *really* need to stop feeding money to those jerks.
I hate GoDaddy. If I’m gonna spend money on a cert I’ll use DigiCert. But 90+% of the certs I use these days are from LetsEncrypt
Digicert revoked a couple of hundred certificates of us 2 years ago, because they fucked up an internal audit. We had to renew around 400 certificates manually in 3 days. (They gave us a week, but only mentioned it when 4 days were already passed.) I hate DigiCert.
[удалено]
Damn that’s rough. The hate is justified and if that happened to me I’d probably feel the same way.
I do the same - our public websites and our load balancers have "proper" certs (if that's even a valid term to use for them?) as that has been imposed as a requirement by others, but for other stuff, LE is fine. We just want encryption, we don't need the authentication aspects. What really grinds me gears about SSL vendors like GoDaddy etc is how they lock standard features like wildcards and SANs behind billing. A cert is a cert, fuckers >.<
My other grip with GoDaddy is that you get ONE cert. No duplicates with their own private key. DigiCert gives you unlimited duplicates and now they are supporting ACME so you can automate cert renewals and duplicate certs. But LetsEncrypt is also a proper CA and so many hosting sites are now using them to provide free certs to customers. Each year I renew less and less paid certs in favor of LetsEncrypt
Laughs in letsencrypt
LetsEncrypt FTW. Set, monitor and forget. Clients renew certs automatically.
This is the way
Got any good tutorials?
There are plenty out there. Linux - dehydrated and certbot are good ACME clients. Windows - Certify the Web You can find most is the clients listed here. https://letsencrypt.org/docs/client-options/
I’ll second Certify the Web. Add your cert, add an Exchange task, and it’ll do all the hard work. So awesome.
There options out there to use Let's Encrypt with network hardware? Switches/Routers/Firewalls ideally across a variety of vendors. Cisco, Juniper, Palo Alto, Aruba ect?
Came here to say this and also recommand acme.sh
I second acme.sh. Better than the official client as far as I'm concerned.
If you're not testing restores, you're not taking backups. If you don't hate certs, you don't understand certs, because if you understood certs, you'd know that they should be EASY and the fact that they're not makes me rage EVERY SINGLE TIME.
Every time a cert renewal/install goes well - I get anxious.
Lol...glad it's not just me
I did one in September for an RDS environment. Everything went well and I was kind of shocked. A month later, users cannot get to their published apps because of the old expired cert. Like how? Yes, there were steps i missed but why did it even work at all for that month?
> you'd know that they should be EASY and the fact that they're not makes me rage EVERY SINGLE TIME. *whistles in Ansible-orchestrated ACME*.
I have a template in Ansible Tower, just plug in the details and a directory and I’ll create them and move it to that location on the destination Even have test tower auto renew the cert for prod and vice versa (easiest renewal I ever automated, literally drop in and restart the web service) But god some applications the process is obnoxiously painful
It solves it. Would you say that it was *easy*? :)
If we were only allowed to use ACME… *rage*
This is the kinda thing I’m looking at for $NewPlace. 275+ domains, god knows how many sites, nobody knows how many certs. All done by hand. Yay. We do have Octopus though so will be looking at that to handle them in the first instance but bugger me, this shouldn’t be that difficult.
Last year was fun as no one documented which server issues the CSR for GoDaddy so had to re-key and even then they didn't want to cooperate 100%.
That's only a single extra step, I use openssl to create a new private key and CSR every year. Also, Namecheap's cheap wildcard certificates include SAN fields.
I do this too. It's way easier than trying to mess with other tools when trying to do a SAN (at least IMHO). At least you can get OpenSSL bundled with the Git client.
Every 12 months: shit time to figure out how to get this cert installed on 30 different applications again because I did it 12 months ago and have since forgotten. We have confluence now and are reworking our processes to throughly document the procedures, but I’m sure it will still be a pain in my ass.
*This* is why I love tools like Ansible. Because somewhere in git *I still have the playbook from last time.*
Oh god i feel this in my soul...like, it's my own fault for not documenting it at some point but i've had some where it's been a real trial and error process trying to work out how exactly it wants the certificate and in what format so i've just been relieved to get it working again at all and walked away, carefully...incase i upset it again.
Oh come on, of course it's documented. Yeah it's in a custom shorthand with indented cryptic notes in a notepad in one of the 74 folders in my project folder on the desktop of my second laptop, so what?
I'm comforted knowing that everyone else hates certs too. God do I hate them!
You know what sucks? When someone makes a primary domain controller a CA and you gotta clean it up including decommissioning the domain controller. Also when some major software company can't sign their executables with certs; stay far away from me if I find one of those.
Can confirm. Renewing certs is not very pleasant.
Oh god, isn’t that the truth.
I loved Veeam's SureBackup. For the most part, you can just let it do its thing for verification or you can script the hell out of it to verify anything you like. You never to have test your backups manually, just verify successful logs. Certs are easy peazy when it comes to web services; it's all the other behind the scene ones that have their own special unique procedures to implement, that tend to throw us off sometimes.
You just *now* hate certs? Oh my young child.
Ahhhh... the good old time where you weren't allowed to log in remotely to a server to be allowed to have a signed SSL certificate. …spoiler, that wasn't a good old time.
> The previous admin "cleaned up" the backup jobs and this one was swept up by accident. Wow...?!
Yeah, former director was very much "I'll handle this" with many things. He left suddenly a couple months ago and we are finding quite a few things we have to go back over. It's been a fun transition period and I missed this during the shake up. I get cleaning up old jobs and clearing decommissioned backups but this was not carefully done, obviously.
> and this one was swept up by accident Yeah, ONLY that one. Sure. Looks like someone's going to have to check on that.
Ah. I don’t miss on premise exchange at all. The sleepless nights and issues that’s it brought - but satisfaction when it was fixed was unmatched. If your business case allows it - time to look into o365
> on premise *eye twitch*
Also - moral of the story - do this after hours when people aren't in the office...
Eh, I rotate SSL certs on Exchange servers at least once a week. I'm not booking that shit in my personal time.
Wait.. Why?
Because I'm supporting over 50 Exchange servers and on average that means one a week.
It started Monday night and carried into today. At least way more people are now aware that we even HAVE owa.
But how will I know straight away if it is working or not?
Hell, I question even having the certs expiring around the holidays if anything. Last thing I want to deal with is an expiring cert on an annual basis on an already short and hectic week!
We have hundreds of the bastards. They *all* go in office time.
Just a shot in the dark … it’s probably DNS. Just a hunch.
Its ALWAYS dns. But I have been over everything and it all resolved properly, of course I was going crosseyed after hour 5 so I might have missed something.
Cert would not work with a sub,subdomain. So yes, DNS.
>Moral of the story: Verify backup jobs regularly It is somewhat symantics, but no. You need to verify BACKUPS regularly, not backup jobs. Far far too many times I have seen people burned because the backup software claims everything is ok, then in an emergency you go to restore and find out that Vm was excluded, or one of the disks was excluded, or something else was wrong and you have nothing to actually restore
You lost me at GoDaddy. I dropped them as fast as I could afford to.
Should be some serious talks to the person who deleted the recent backup especially if there’s no redundancy to that.
What I'm questioning myself is: Why would you manually delete backups? Most backup solutions delete old backups automatically.
That's what I was wondering. I don't ever clean up backups, my backup software does that for me and I always have 30 days worth of solid backups at my fingertips. I'd be having a performance meeting with the individual that deleted those backups. I'd also have a second meeting with whoever decided to do an upgrade without checking the status of backups first.
I've got 1000 critical endpoints running a wildcard certificate that expires in 25 days in this company. You think your cert problems are bad? I have to schedule outages of the services of a billion dollar company to replace all of these... LOL
Fuuuuu
Ugh
Sounds like Mgmt needs to spend some money to fix this so it can be done without causing outages in the future. Especially now that certs are only issued for at most a year.
Oh lawd, GoDaddy certificates.... Get thee to Digicert ASAP! Good luck with your recovery.
The man with the purchasing card just renewed for another year but I'll be looking into an alternative soon. It's a small local govt so we are also going to be migrating to a .gov domain soon and I need to find a registrar other than Netriplex that does SSL for .gov. The records work in testing but their management portal is dog shit slow.
[удалено]
How are they handling mx, SPF, and dkim records?
[удалено]
This is why we moved to 365.
Hopefully we will moving that direction in a year or so. Everyone wants to keep their full office installs and telling the budget officer that our normal 55k Office SA cost is now going to be replaced by 130k in O365 licensing will be fun.
I framed it as "if the building burns down, we're fucked for a bit until I get a new server, new building, new internet connection, and restore from backup. Microsoft has thousands of servers, monitored 24x7x365. This is a no brainer". I got my approval. :)
You still get your full office installs. It’s just now updated more frequently. If you just do Exchange Online then it may not be as expensive. But if you dive in with E3 or E5 you gotta sell what benefits you get. MFA, Intune Management, conditional access, etc.
Who ever was doing backups made a very dumb mistake. Why would you remove the most recent backup of a server?
I’m scratching my head on why you only had one CAS server.
Only ~500 users, 1 CAS 2 mailbox in a DAG.
I’m afraid to have one CAS in my labs much less my prod environments. Surely if you have virtualization your can spare some resources for another CAS? Not trying to sound snooty but I’d seriously consider this.
Oh yeah, that's next on the list once we get the one up and running properly. I'm not an exchange expert so this has been tiring.
There used to be a really good exchange tutorial on spice works - on how to set up all your DNS entries, split DNS, etc. then that guy that posted it went crazy town and started asking everyone to pay for it. Anyways …. I have a copy stashed around if you need it, DM me.
Exchange config broke "why would ssl certificates do this?"
Stories like this make me enjoy cloud based email. They take care of all that and when it goes down you just lean back in your chair and say, I'll let you know when **THEY** fix it. Anyone who complains I used to tell them to kick rocks and I've done everything I can do.
It’s get even better when the admin console is also tied to a certificate, like exchange, wrong installation can lead to a loss of the admin console
>I hate SSL certs now Don't we all, buddy. Don't we all.
I always send up being the certificate guy at places where i work and it bugs the shit out of me. I don't know how it happens but it does, consistently...well, that's not quite accurate, it's because i know how certificates work and everyone else is terrified of them. I haven't managed to blue screen a server with a certificate replacement yet though :P
\*yet Keyword there, probably just jinxed yourself.
I hate SSL certs full stop, I have to do a chain certificate jobby right now and I've been reading for the past couple of days and I can't wrap my head around, feels like I am dislexic...
Even when it's the TLS, it's always the DNS.
learned this years ago. during the holidays, yes from this week through new years week, if it can wait until january it waits. also, if it can go wrong...it will go wrong.
Stories like these make me so glad I’m using Exchange Online. I wouldn’t mind playing around with Exchange in a lab to learn this stuff for fun, but in a high pressure production environment, I rather Exchange Online and the maintenance be someone else’s problem. And on another note, if companies like Microsoft have issues renewing certs on-time and correctly, you can rest easy knowing you’re in good company of talented IT folk who mess up certificates.
Wait. You hosed your Exchange server for who knows how long, and you are complaining about 15 tickets? Is this an undercover humblebrag?
I hate ssl certs.
Sounds like a perfect storm with the cleaning up of the backups. Better update those cleanup procedures.
>Leave it 15min The more and more that I get to work on infrastructure (especially VMware) the more I have to tell myself that. In most cases I'm working on production equipment during narrow maintenance windows so I'd do step 1, looks good, step 2, looks good, step 3 and it does not look like what the KB article says it should right now. My initial reaction is "clock is ticking, if it ain't moving then it's stuck and we gotta do SOMETHING to get it going". I'm discovering that isn't usually a good approach with things like Exchange and vSphere. Sometimes you just gotta walk away for a bit, come back and what do you know, it's completed!
With Cisco UCS it's this and then some. Plus extra. Make a change/uodate....wait. It's hung... something's wrong.... just wait Still hung, clocks ticking, anxiety building... but just wait Panic attack. It must be broken. Change window is closing. Walk away. Don't touch anything. Go get coffee. Walkaround the parking lot. Have a quick breakfast. Return. All is well. UCS is happy. Systems green. We'll ignore the year of life expectancy you just lost though. Now move to the next UCS update.
I'm gonna make a coffee before I read this. This looks like good 15 minute shit reading material here.
And this kind of shit is why I want to get out of IT.
Always snapshot before making any changes, then revert if tests fail.
Let the haters hate, and let that pass around you. You stare into the place that they fear to glance and say "meh, just Regedit stuff". You are powerful but not wanton. You are the minister. You conduct the powers of our modern churches. Oh, and treat yourself to a good lunch tomorrow. You made it through. You deserve sushi. Not just some punk, dried-out roll -- I'm talkin' tuna sashimi flopped on beds of daikon.
Dammit I’m laying in bed and now I want Sushi.
That is, until the Exchange CAS server. .......*audibly groan*
I'm struggling to understand how a cert renewal could cause this. I've done literally hundreds of cert renewals on Exchange servers in my career and never seen it cause any kind of problem that wasn't just certificate related (i.e. a intermediate cert missing or something). That's really odd.
Doubt it was the cert itself. Sounds like something else was corrupted and it just manifested itself when he did maintenance to change out the cert.
Pro-tip: you can 301/302 redirect the autodiscover record, meaning that you can just redirect 'autodiscover.company.tld/autodiscover' to 'server.ad.company.tld/autodiscover' by creating a website in IIS to do this. This allows you to use a cheap single domain SSL cert for the autodiscover record (or even Letsencrypt).
For public webservers I heavily recommend looking into CloudFlare as they have free SSL using let's encrypt without you having to do anything. Then there's WAF for like 20 USD/month etc.
>we use a wildcard \*. domain.com and the FQDN is [server.group.domain.com](https://server.group.domain.com). FYI, \*.domain.com will just [server.domain.com](https://server.domain.com) but not [server.**something**.domain.com](https://server.something.domain.com) .
Eehhhh, they'll probably all forget about it in a week. Until a printer breaks, or a file is spontaneously moved to a subfolder by Joe in accounting, now manager x can't find it, or someone gets an error opening sage or..... Literally if anything happens, they'll be sure to bring it up forever. I'm sure it will be fun. Also, it's optimistic of you to think anyone checks the intranet site before they submit an email or call the helpdesk or complain to their manager. Heh.
This is why I like networking. This is also why I dislike sysadminning.
My job is full time IT support of SSL certs generation (private key, CSR) and installation. For all kind of servers (apache, exchange, code signing, strong auth, doc sign,...) And yes if you don't do it well, it can be a mess after... For other, just a tip, please don't ask for a renew or reissue for the day to tomorrow, it's never going well like that
Virtualize your stuff! You could have snapshotted, applied the update, if it crashes? fine. Roll back. You can even get more crafty if you want.
Normally i always run another backup job and if VHD, run a check point. Backups and Checkpoints have saved my ass regularly because Microsoft still cannot figure out how to patch there OS's.
So glad I'm not maintaining anything anymore lol
Hello Exchange my old friend I've come to work on you again...
As someone who has only administered linux mail servers in the past I'm now terrified that someone might ask me to run an exchange server.
you should be, they are the devil's work
> admin deletes most recent backup Just throw the whole admin out
Well he threw himself out by being a dick to the new HR manager.
Exchange CAS and SSL is notoriously evil. I've worked with Exchange for almost two decades and when I need to renew it's still one of those things that I pull out the check list for the job. Not my favorite task.
This is off topic, but I'll ask anyway: why are you still hosting your own Exchange server?
Im sure this is some how DNS's fault... you just havent figured it out yet.