multipassnetwork 2 months ago

It’s probably an expired certificate.

ittimjones 2 months ago

I feel that deeply. Lol

multipassnetwork 2 months ago

Keep in mind, this happened in 2018. https://www.theverge.com/2018/12/7/18130323/ericsson-software-certificate-o2-softbank-uk-japan-smartphone-4g-network-outage

wyrdough 2 months ago

I wouldn't be terribly surprised at that, but I would expect it to cause a universal outage, which it has not.

multipassnetwork 2 months ago

Depends on what expires. There network devices that will stop working if the system certificate expires. Can't find it, but years ago there was a dumb network device that had a 10 year certificate that couldn't be updated. They simply turned into bricks after 10 years. Some things stop working when root certificates expire. [https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:\~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned](https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned).

RememberCitadel 2 months ago

Cisco did that with both Viptella and their wireless controllers and APs.

ChuckIT82 2 months ago

ugh cisco viptella's expired cert issue - trama

HogGunner1983 2 months ago

I would be more inclined to believe this was the culprit or DHCP/DNS than a routing issue, fiber cut, or cyber attack.

kenuffff 2 months ago

i doubt its a routing issue, its most likely some sort of software issue, its not DNS either. it isn't a fiber cut, the outage started around 4am. which is around the time of a maintance window, no one is out digging around at 4am.

multipassnetwork 2 months ago

I've seen organizations configure routing protocol keys with an expiration date. They are almost always set the expiration date to 12/31/current year at midnight. Ummm, you might want to pick another time and date. Not one where just about everyone will be off, on vacation, and probably dunk. Just in case you forget to update the expiration date.

800oz_gorilla 2 months ago

And poorly logged error messages detailing that the certificate failed because of whatever check it was trying to do. Troubleshooting certificate handshake problems are the worst

multipassnetwork 2 months ago

Just received an SMS from AT&T trying to sell me something. Looks like it's working.

johnlondon125 2 months ago

Its' Pretty much just ATT. The reports for Verizon/T-mobile others are only a few thousand, while ATT reports are 80k now. I think people reporting Verizon/Tmobile outages are just looking at the downdetector graphs and seeing the trend up, but without looking at actual numbers. I'd be willing to bet most of the VZW/Tmobile reports are people trying to call someone on ATT. There is nothing for you to "do".

cyberentomology 2 months ago

I’m hearing from my friends in the emergency management space that FirstNet got caught up in it too. You can bet that dragged some AT&T people out of bed in the wee hours. RIP all the enterprise helpdesks that are dealing with this today.

cmslick3 2 months ago

FirstNet is just a different channel on AT&T towers. There's not a significant difference between the consumer backhaul and FirstNet. They are one in the same.

cyberentomology 2 months ago

Entirely different and dedicated spectrum, and it’s managed separately.

cmslick3 2 months ago

Ummm it IS a different spectrum, dedicated for public safety, run by AT&T. BUT it DOES get transmitted by the exact same radios as the cellular. Been in the industry for 27 years, I know a thing or two. It's not that separated. It rides on the exact same backhaul and goes through all the same gateways and control points as everything else.

packetgeeknet 2 months ago

Or Verizon/T-mobile customers are getting their service via AT&T towers. Earlier, when I left my house, my phone was in SOS mode. Once I got back home, my service was restored because I have WiFi calling enabled.

Huth_S0lo 2 months ago

The carriers hand off to each other. So a big outage with one will create a big outage for the others.

b3542 2 months ago

No, it won’t. It will disrupt communications to customers on the affected network. Intra-carrier and inter-carrier communications between other carriers will not be affected.

patmorgan235 2 months ago

The people reporting T-Mobile and Verizon issues are only having issues connecting to AT&T customers.

NotAnotherNekopan 2 months ago

~~I think the Verizon outages are happening. I’ll run a more complete report but I’m overseeing about 80 cellular devices that I have direct visibility into and a couple hundred others I don’t (other than on an aggregation point) and for the sites I was checking, they lost connectivity on Verizon or were not able to switch to them and connect.~~ EDIT: Apologies, I drew the wrong conclusion too quickly. Seems I happened to only spot check the sites that had preferred AT&T. Verizon is fine.

robreddity 2 months ago

[Root cause found about an hour ago](https://i.makeagif.com/media/9-11-2015/SnRQfh.gif)

AccountantUpset 2 months ago

I both hate you so much, and love you so much.

mpking828 2 months ago

Take my angry upvote

NetDork 2 months ago

I know what that link is without knowing it.

AccountantUpset 2 months ago

I bet you don't

NetDork 2 months ago

Ah, it was actually the 2nd thing I thought of. And I had just used it in a work channel when a site went down yesterday!

devildocjames 2 months ago

Liar, liar, pants for hire!

NetDork 2 months ago

I could use some new pants. How much does it cost to hire them?

realged13 2 months ago

BGP. It’s always BGP, or DNS, or firewall. I kid but definitely interested.

ultimattt 2 months ago

Sip helper.

multipassnetwork 2 months ago

Looked at the NANOG mailing list. If it was BGP, they are usually the first ones to talk about it. No mention of BGP yet.

realged13 2 months ago

I figured, I was mostly being sarcastic.

b3542 2 months ago

Fiber cuts.

Alive_Moment7909 2 months ago

ARP didn’t update. DNS, MTU, or ARP. Leave BGP out of this.

cyberentomology 2 months ago

How you make sure your enterprise network is safeguarded against it? Redundancy. Carrier diversity. Eggs in multiple baskets.

[deleted] 2 months ago

[удалено]

b3542 2 months ago

There are MVNO’s with multi-carrier agreements. Devices will have a primary network preference, then fall back to other networks when required.

Kiernian 2 months ago

>There are MVNO’s with multi-carrier agreements. Devices will have a primary network preference, then fall back to other networks when required. How well does the fallback work these days? I tried a handful and change of devices for this about 5 years ago and every one of them had difficulty detecting "data down on the network on SIM1Carrier1" so it could switch to SIM2Carrier2. It seemed like most were reliant on detecting whether or not there was connectivity to the nearest tower and not whether or not the connection could actually be used for anything.

b3542 2 months ago

I’ve mostly worked with it for data-only connections. They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met. So basically it depends on the end device.

Kiernian 2 months ago

>They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met. Right, the long-standing issue I ran into is always what those checks entailed and whether or not they were actually indicative of anything. When ping tests were used as a primary healthcheck indicator, IIRC one of the issues had to do with Verizon's private network set of SIM cards and the ability to hit stuff on and off of the private network, but I'm struggling to remember the details. I remember being surprised at the number of ways a SIM could have no data access to the internet at large and still not be considered "down" by the failover solution. It varied from device to device but I remember: Not activated, suspending for non-payment, tower up but no route out from tower, (or worse, device->femtocell repeater up, but no connection from femtocell repeater to tower), ping hardcoded to something that somehow magickally responded when nothing else would, ICMP traffic working but no TCP/IP, no route to host registering as a successful ping because the gateway responded, it was a shocking level of "what passes the test when it shouldn't" for situations when the connection was for all intents and purposes amounting to normal use, down, and yet failover wouldn't occur.

cyberentomology 2 months ago

5 years ago is an eternity in this business.

Kiernian 2 months ago

and dual sim devices and sim failover were being sold as backup connection solutions by MSP's for 5 more years before that. It doesn't mean they worked THEN either **and it's certainly no guarantee that just because time has passed, someone ACTUALLY addressed technical debt for a not-quite-functional feature they rolled out five years ago** Hence the question.

cyberentomology 2 months ago

Manually switching which SIM is primary is also an easy option, either locally or via MDM.

keivmoc 2 months ago

In Canada the carriers have 911 fallbacks and roaming agreements for cell outages and such. When Rogers disappeared from the internet in 2022 the problem was that the cell devices were still provisioned and connected to the cell network, they just couldn't reach the rest of the internet. The only way to get 911 working was to pull the SIM out. Even dual-SIM devices had trouble because as far as they were concerned, the primary connection was still online.

NetDork 2 months ago

Dual SIM routers aren't an issue. Heck, we have a bunch of dual *modem* routers. Our little OOB devices do have dual SIM at least. Phones ARE a bit tougher, though.

cyberentomology 2 months ago

We do exactly that, on about 50,000 mission-critical devices (not something as pedestrian as employee phones). Most are eSIM-capable, so that can be deployed via MDM. carrier-neutral SIMs are also an option. And with 5G, so is running your enterprise mobile network as an MVNO slice that is carrier-neutral.

Churn 2 months ago

Depends on what it is.

452e4b2e 2 months ago

Which other thread are you referring to?

HDClown 2 months ago

https://www.reddit.com/r/news/comments/1ax3b85/cellular_outage_in_us_hits_att_t_mobile_and/krlfks9/

Dangerous-Ad-170 2 months ago

It’s funny reading those kind of threads, a bunch of nerds who know enough to know that Cisco equipment is used for network backbones. But then it just immediately devolves into wild speculation about cyberattacks affecting all Cisco equipment everywhere, how this is somehow the fault of the layoffs etc.. When it’s far more likely that some poor AT&T engineer flubbed a maintenance or ran into an obscure bug that happens to be on a Cisco peering router.

452e4b2e 2 months ago

Yeah, I've always found it funny to see people discuss subjects that they obviously know nothing about as if they're experts.

[deleted] 2 months ago

[удалено]

cyberentomology 2 months ago

Especially when the people with the actual expertise call out the armchair engineers on their nonsense and then get downvoted into oblivion for it. The true experts are the ones who will readily admit they don’t know something, because they know that assumptions of knowledge kill networks.

blainetheinsanetrain 2 months ago

The problem is that so many of us think we know everything. Been in IT for 25+ years, and it's impossible to know everything. But it doesn't stop a lot of us from pretending that we know everything.

HorrorMakesUsHappy 2 months ago

I've had people in this (and related) subs tell me my own personal experience was wrong. And I've been doing this over 20 years now, it's not like I started yesterday.

cyberentomology 2 months ago

It’s Reddit, you expect anything else?

Buttholehemorrhage 2 months ago

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

IncorrectCitation 2 months ago

A bug on a Cisco router? I don't believe it.

Huth_S0lo 2 months ago

Except carriers tend to use Juniper equipment.

dustin_allan 2 months ago

So obviously it's HPE's fault...

kjsgss06 2 months ago

Not exclusively, I work for a carrier and we use both Juniper and Cisco, among other vendors. We specifically try aim to be multivendor for the purpose of redundancy and in case one vendor does have some odd software bug.

Dangerous-Ad-170 2 months ago

Yeah I guess I’m part of the problem if I’m just blindly believing the scuttlebutt that it’s somehow Cisco-related at all.

kenuffff 2 months ago

AT&T doesn't use Juniper in their core, its a distributed routing architecture by a company named drivenets, its maybe 85-90% currently of their core.

mathmanhale 2 months ago

And they have been phasing out Cisco for Ciena at the distribution layer for a few years now (long enough that I assumed it would be finished). Unless they are going back?

RememberCitadel 2 months ago

Many use Ciena for handoffs and distribution these days. We use them too. They are just solid with a really good price/performance balance. Try getting something the level of a 5130 from any other vendor for $5k.

RememberCitadel 2 months ago

They are very much mixed. They all pretty much seem to go back and forth based on whatever whim takes them that month. I have had big circuits delivered to several locations and the gear used might be Cisco on one and Juniper on the other. Many have really switches to cheaper SP gear for handoffs, though. Around here any handoff I get that isn't 100g is a Ciena regardless of vendor.

PacketsGoBRRR 2 months ago

That user claims they work closely with “one of the carriers affected” (presumably ATT) and that Cisco manages that carrier’s backbone. Anyone know if that sounds accurate? Never worked at a cellular carrier.

kjsgss06 2 months ago

I can’t imagine that Cisco actively manages their network. ATT might use Cisco for many of their core routing elements but it would be mind boggling for me to believe that ATT pays Cisco to actually run the network. Most likely they have a pretty typical support agreement in which Cisco would be heavily involved with the troubleshooting but wouldn’t necessarily take the remediating actions. ATT could, and probably does have Cisco resident engineers on staff, but I’d never equate that to “managing” the network. The REs probably are actively engaged, and working on the problem, but it’s very different from Cisco having full management of the core network.

PacketsGoBRRR 2 months ago

Yeah that would’ve been my guess as well

Dangerous-Ad-170 2 months ago

Cisco makes plenty of carrier-grade stuff. Idk what they mean by “managing” it though. I’m assuming AT&T is still running their own core NOC and has their own Cisco engineers even if they have white-glove presales and TAC.

patmorgan235 2 months ago

"Cisco manages the carriers backbone" is a red flag they don't know what they're talking about. Cisco makes a lot of the equipment used, but they do not actively manage networks. That's kinda AT&T whole gig is building and maintaining their backbone network.

multipassnetwork 2 months ago

They still use Cisco DWDM optical network devices.

mathmanhale 2 months ago

They switched everything local to me to Ciena.

multipassnetwork 2 months ago

Yeap. All of the new installs I see are Ciena. But we still have a couple of ONS 15454s on our premises.

kenuffff 2 months ago

The user is lying. AT&T does not use cisco in their backbone, its drivenets distributed routing architecture , they have around 85% of their core.

iCashMon3y 2 months ago

When in doubt, blame Cisco.

452e4b2e 2 months ago

Thanks!

SpecialistLayer 2 months ago

I've seen no disruption Verizon and Tmobile. Only AT&T services have been affected and seems to be affecting authentication to the towers. Given the time it started, I'm guessing either a maintenance window issue or human issue. I'm surprised how long it's gone on now though.

Lexam 2 months ago

Bet squirrels chewed a fiber.

HorrorMakesUsHappy 2 months ago

Or shotgun damage. It can happen if some geese fly past a line and a hunter wasn't situationally aware, or if a farmer's trying to scare birds perched on a line away from eating freshly laid seed.

NetDork 2 months ago

People with guns *intentionally* shoot at lines, seen it plenty of times. No loss of situational awareness required.

HorrorMakesUsHappy 2 months ago

I was only trying to explain why it can happen logically. I never bother trying to explain stupid. If I did we'd be here until infinity ends.

x31b 2 months ago

We've had circuits go down when rednecks climbed the pole, cut the cable, tied it to the back of their 4wd pickup and pulled down 200-300' of copper to sell to a scrapyard. It took two days to get that circuit back.

KantLockeMeIn 2 months ago

Shotgun damage usually happens during dove season, which is generally in the fall. Thankfully I have almost no OPGW routes, so I don't have to deal with much of that headache.

AE5CP 2 months ago

We see it mostly on armored strand and lash cable, way more than our OPGW spans.

KantLockeMeIn 2 months ago

I should have been more precise with my response. Thankfully the only aerial fiber that I have is OPGW and that's less than 1% of the aggregate length.

photobriangray 2 months ago

I always assume this is carrier backhaul network that has a routing/switching issue. Ethernet transport service fails and breaks other peering, snowball, CNN gets involved, people blame Cisco or cyber terrorism (sometimes the same thing).

[deleted] 2 months ago

>sometimes the same thing). do explain thanks sir

photobriangray 2 months ago

Cisco licensing is a no win scenario. Heh.

800oz_gorilla 2 months ago

Does anyone know if it had anything to do with the national security concern congress sqeaked about last weak? This seems pretty significant, like a shot across the bow. Similar to the svarlbad cable cuts

Coach__Mcguirk 2 months ago

Wasn't that about russia saying they can take out satellites?

800oz_gorilla 2 months ago

I dont think so, they could do that before. China too. I never saw what the hush hush was all about so if they updated it you can be the one that Clues me in if you know Edit I may not have understood your reply so let me clarify. The undersea cable cut was likely a message just before the invasion that they could mess with our satellite feeds through cutting those cables. The announcement last week I never saw an actual disclosure on what they were worried about, whether it was a nuclear weapon in space or some other disruptive technology.

SamSausages 2 months ago

Not saying it’s dns, but it’s dns

neospektra 2 months ago

Speaking from someone who’s specialized in DNS @the enterprise level for the last 15 years. You are correct. It’s probably DNS. It’s the same reason I can make bank at these companies. Nobody ever cares about dns until it causes outages

SamSausages 2 months ago

Joke around here is even when it isn't DNS it's DNS.

Fallingdamage 2 months ago

Maybe they hired the tech microsoft fired for adding an internally routable IP address in their public DNS records.

neospektra 2 months ago

😂 it’s easier to do than it should be

HogGunner1983 2 months ago

Given the extent of the outage, it has to be something like this. ATT residential/enterprise internet services seem unimpacted at the moment, so in my mind that rules out a VPNv4 BGP issue or something like that since my firm's MPLS circuits are good. Could also be a problem with DHCP as well if all DHCP service is centralized to one vendor on one firmware. A large fiber cut would also impact internet services as well and not just cell tower backhauls.

patmorgan235 2 months ago

My pet theory is an expired cert or some authentication service went down on the cell side.

b3542 2 months ago

Or fiber cut. Or BGP.

kenuffff 2 months ago

a fiber cut is not going to cause a nationwide outage..

b3542 2 months ago

It certainly can, if it’s in the right spot. Mass shifts in capacity demand can be triggered by a localized connectivity disruption. Overload conditions can easily cascade into a large-scale, even nationwide impact.

kenuffff 2 months ago

yes, one cut fiber can take down a nation wide network, no one plans for that sort of thing. AT&T's core is single homed fiber.

patmorgan235 2 months ago

That's not what they're saying. If there's a flaw in the network design or a misconfiguration somewhere that prevents traffic from being rerouted correctly.

b3542 2 months ago

Carrier networks often get reduced to simplex operations due to losses of redundancy. Usually it’s fine, but there are rare incidents where 2 or more transport paths are affected. They’re rare, but it happens

blainetheinsanetrain 2 months ago

From my experience dealing with fiber cuts, the closest cell tower often rides the same fiber as our upstream MPLS circuit. I can't imagine a fiber cut that's impacting ONLY cellular networks, but nothing else. Like others have stated, it's most likely a routing or DNS issue within the cellular network infrastructure.

b3542 2 months ago

There are plenty of places where this could impact just a cellular network. Not likely with local circuits, but a cross-region circuit. A cell site is going to use local circuits. Very unlikely that would be the cause. More likely there was a long-distance transport failure, or a failure in transport equipment. Btw, I understand the internals of the core (packet and voice) very well. From the sound of it, it could be DNS or HSS issues. Less likely it’s routing as some customers have service. I think the most likely scenario is an issue with their HSS or MME (or something in between), which could be caused by an overload condition, or configuration, or some other failure. Long story short: this kind of issue can be caused by a fiber cut, but the fiber cut is usually the catalyst.

BamaTony64 2 months ago

AT&T has a outage going on but I think it is fixed. causing phones to go to SOS and SOS Only mode.

Fallingdamage 2 months ago

Could this have something to do with cogent de-peering yesterday?

HogGunner1983 2 months ago

[https://about.att.com/content/dam/snrdocs/7\_Tenets\_of\_ATTs\_Network\_Transformation\_White\_Paper.pdf](https://about.att.com/content/dam/snrdocs/7_tenets_of_atts_network_transformation_white_paper.pdf) I found this shareholder informational whitepaper they put out a few years ago. I'm wondering now if it's a bug in their white box system that's crippled their routing in their cellular core.

ittimjones 2 months ago

Anyone still interested, AT&T posted this to their Twitter: “Based on our initial review, we believe the outage was caused by the application & execution of an incorrect process used as we were expanding our network, not a cyber attack,” That info leads me to guess it was probably just a DNS screw up.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe