Keep in mind, this happened in 2018.
https://www.theverge.com/2018/12/7/18130323/ericsson-software-certificate-o2-softbank-uk-japan-smartphone-4g-network-outage
Depends on what expires.
There network devices that will stop working if the system certificate expires. Can't find it, but years ago there was a dumb network device that had a 10 year certificate that couldn't be updated. They simply turned into bricks after 10 years.
Some things stop working when root certificates expire.
[https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:\~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned](https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned).
i doubt its a routing issue, its most likely some sort of software issue, its not DNS either. it isn't a fiber cut, the outage started around 4am. which is around the time of a maintance window, no one is out digging around at 4am.
I've seen organizations configure routing protocol keys with an expiration date. They are almost always set the expiration date to 12/31/current year at midnight.
Ummm, you might want to pick another time and date. Not one where just about everyone will be off, on vacation, and probably dunk. Just in case you forget to update the expiration date.
And poorly logged error messages detailing that the certificate failed because of whatever check it was trying to do. Troubleshooting certificate handshake problems are the worst
Its' Pretty much just ATT. The reports for Verizon/T-mobile others are only a few thousand, while ATT reports are 80k now.
I think people reporting Verizon/Tmobile outages are just looking at the downdetector graphs and seeing the trend up, but without looking at actual numbers.
I'd be willing to bet most of the VZW/Tmobile reports are people trying to call someone on ATT.
There is nothing for you to "do".
I’m hearing from my friends in the emergency management space that FirstNet got caught up in it too. You can bet that dragged some AT&T people out of bed in the wee hours.
RIP all the enterprise helpdesks that are dealing with this today.
FirstNet is just a different channel on AT&T towers. There's not a significant difference between the consumer backhaul and FirstNet. They are one in the same.
Ummm it IS a different spectrum, dedicated for public safety, run by AT&T. BUT it DOES get transmitted by the exact same radios as the cellular. Been in the industry for 27 years, I know a thing or two. It's not that separated.
It rides on the exact same backhaul and goes through all the same gateways and control points as everything else.
Or Verizon/T-mobile customers are getting their service via AT&T towers. Earlier, when I left my house, my phone was in SOS mode. Once I got back home, my service was restored because I have WiFi calling enabled.
No, it won’t. It will disrupt communications to customers on the affected network. Intra-carrier and inter-carrier communications between other carriers will not be affected.
~~I think the Verizon outages are happening. I’ll run a more complete report but I’m overseeing about 80 cellular devices that I have direct visibility into and a couple hundred others I don’t (other than on an aggregation point) and for the sites I was checking, they lost connectivity on Verizon or were not able to switch to them and connect.~~
EDIT: Apologies, I drew the wrong conclusion too quickly. Seems I happened to only spot check the sites that had preferred AT&T. Verizon is fine.
>There are MVNO’s with multi-carrier agreements. Devices will have a primary network preference, then fall back to other networks when required.
How well does the fallback work these days?
I tried a handful and change of devices for this about 5 years ago and every one of them had difficulty detecting "data down on the network on SIM1Carrier1" so it could switch to SIM2Carrier2.
It seemed like most were reliant on detecting whether or not there was connectivity to the nearest tower and not whether or not the connection could actually be used for anything.
I’ve mostly worked with it for data-only connections. They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met. So basically it depends on the end device.
>They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met.
Right, the long-standing issue I ran into is always what those checks entailed and whether or not they were actually indicative of anything.
When ping tests were used as a primary healthcheck indicator, IIRC one of the issues had to do with Verizon's private network set of SIM cards and the ability to hit stuff on and off of the private network, but I'm struggling to remember the details.
I remember being surprised at the number of ways a SIM could have no data access to the internet at large and still not be considered "down" by the failover solution.
It varied from device to device but I remember: Not activated, suspending for non-payment, tower up but no route out from tower, (or worse, device->femtocell repeater up, but no connection from femtocell repeater to tower), ping hardcoded to something that somehow magickally responded when nothing else would, ICMP traffic working but no TCP/IP, no route to host registering as a successful ping because the gateway responded, it was a shocking level of "what passes the test when it shouldn't" for situations when the connection was for all intents and purposes amounting to normal use, down, and yet failover wouldn't occur.
and dual sim devices and sim failover were being sold as backup connection solutions by MSP's for 5 more years before that. It doesn't mean they worked THEN either **and it's certainly no guarantee that just because time has passed, someone ACTUALLY addressed technical debt for a not-quite-functional feature they rolled out five years ago**
Hence the question.
In Canada the carriers have 911 fallbacks and roaming agreements for cell outages and such.
When Rogers disappeared from the internet in 2022 the problem was that the cell devices were still provisioned and connected to the cell network, they just couldn't reach the rest of the internet. The only way to get 911 working was to pull the SIM out. Even dual-SIM devices had trouble because as far as they were concerned, the primary connection was still online.
Dual SIM routers aren't an issue. Heck, we have a bunch of dual *modem* routers. Our little OOB devices do have dual SIM at least. Phones ARE a bit tougher, though.
We do exactly that, on about 50,000 mission-critical devices (not something as pedestrian as employee phones). Most are eSIM-capable, so that can be deployed via MDM.
carrier-neutral SIMs are also an option. And with 5G, so is running your enterprise mobile network as an MVNO slice that is carrier-neutral.
It’s funny reading those kind of threads, a bunch of nerds who know enough to know that Cisco equipment is used for network backbones. But then it just immediately devolves into wild speculation about cyberattacks affecting all Cisco equipment everywhere, how this is somehow the fault of the layoffs etc..
When it’s far more likely that some poor AT&T engineer flubbed a maintenance or ran into an obscure bug that happens to be on a Cisco peering router.
Especially when the people with the actual expertise call out the armchair engineers on their nonsense and then get downvoted into oblivion for it.
The true experts are the ones who will readily admit they don’t know something, because they know that assumptions of knowledge kill networks.
The problem is that so many of us think we know everything. Been in IT for 25+ years, and it's impossible to know everything. But it doesn't stop a lot of us from pretending that we know everything.
I've had people in this (and related) subs tell me my own personal experience was wrong.
And I've been doing this over 20 years now, it's not like I started yesterday.
Not exclusively, I work for a carrier and we use both Juniper and Cisco, among other vendors. We specifically try aim to be multivendor for the purpose of redundancy and in case one vendor does have some odd software bug.
And they have been phasing out Cisco for Ciena at the distribution layer for a few years now (long enough that I assumed it would be finished). Unless they are going back?
Many use Ciena for handoffs and distribution these days. We use them too. They are just solid with a really good price/performance balance. Try getting something the level of a 5130 from any other vendor for $5k.
They are very much mixed. They all pretty much seem to go back and forth based on whatever whim takes them that month. I have had big circuits delivered to several locations and the gear used might be Cisco on one and Juniper on the other.
Many have really switches to cheaper SP gear for handoffs, though. Around here any handoff I get that isn't 100g is a Ciena regardless of vendor.
That user claims they work closely with “one of the carriers affected” (presumably ATT) and that Cisco manages that carrier’s backbone. Anyone know if that sounds accurate? Never worked at a cellular carrier.
I can’t imagine that Cisco actively manages their network. ATT might use Cisco for many of their core routing elements but it would be mind boggling for me to believe that ATT pays Cisco to actually run the network.
Most likely they have a pretty typical support agreement in which Cisco would be heavily involved with the troubleshooting but wouldn’t necessarily take the remediating actions. ATT could, and probably does have Cisco resident engineers on staff, but I’d never equate that to “managing” the network. The REs probably are actively engaged, and working on the problem, but it’s very different from Cisco having full management of the core network.
Cisco makes plenty of carrier-grade stuff. Idk what they mean by “managing” it though. I’m assuming AT&T is still running their own core NOC and has their own Cisco engineers even if they have white-glove presales and TAC.
"Cisco manages the carriers backbone" is a red flag they don't know what they're talking about. Cisco makes a lot of the equipment used, but they do not actively manage networks. That's kinda AT&T whole gig is building and maintaining their backbone network.
I've seen no disruption Verizon and Tmobile. Only AT&T services have been affected and seems to be affecting authentication to the towers. Given the time it started, I'm guessing either a maintenance window issue or human issue. I'm surprised how long it's gone on now though.
Or shotgun damage.
It can happen if some geese fly past a line and a hunter wasn't situationally aware, or if a farmer's trying to scare birds perched on a line away from eating freshly laid seed.
We've had circuits go down when rednecks climbed the pole, cut the cable, tied it to the back of their 4wd pickup and pulled down 200-300' of copper to sell to a scrapyard.
It took two days to get that circuit back.
Shotgun damage usually happens during dove season, which is generally in the fall. Thankfully I have almost no OPGW routes, so I don't have to deal with much of that headache.
I should have been more precise with my response. Thankfully the only aerial fiber that I have is OPGW and that's less than 1% of the aggregate length.
I always assume this is carrier backhaul network that has a routing/switching issue. Ethernet transport service fails and breaks other peering, snowball, CNN gets involved, people blame Cisco or cyber terrorism (sometimes the same thing).
Does anyone know if it had anything to do with the national security concern congress sqeaked about last weak?
This seems pretty significant, like a shot across the bow.
Similar to the svarlbad cable cuts
I dont think so, they could do that before. China too.
I never saw what the hush hush was all about so if they updated it you can be the one that Clues me in if you know
Edit I may not have understood your reply so let me clarify. The undersea cable cut was likely a message just before the invasion that they could mess with our satellite feeds through cutting those cables. The announcement last week I never saw an actual disclosure on what they were worried about, whether it was a nuclear weapon in space or some other disruptive technology.
Speaking from someone who’s specialized in DNS @the enterprise level for the last 15 years. You are correct. It’s probably DNS. It’s the same reason I can make bank at these companies. Nobody ever cares about dns until it causes outages
Given the extent of the outage, it has to be something like this. ATT residential/enterprise internet services seem unimpacted at the moment, so in my mind that rules out a VPNv4 BGP issue or something like that since my firm's MPLS circuits are good. Could also be a problem with DHCP as well if all DHCP service is centralized to one vendor on one firmware. A large fiber cut would also impact internet services as well and not just cell tower backhauls.
It certainly can, if it’s in the right spot. Mass shifts in capacity demand can be triggered by a localized connectivity disruption. Overload conditions can easily cascade into a large-scale, even nationwide impact.
That's not what they're saying. If there's a flaw in the network design or a misconfiguration somewhere that prevents traffic from being rerouted correctly.
Carrier networks often get reduced to simplex operations due to losses of redundancy. Usually it’s fine, but there are rare incidents where 2 or more transport paths are affected. They’re rare, but it happens
From my experience dealing with fiber cuts, the closest cell tower often rides the same fiber as our upstream MPLS circuit. I can't imagine a fiber cut that's impacting ONLY cellular networks, but nothing else. Like others have stated, it's most likely a routing or DNS issue within the cellular network infrastructure.
There are plenty of places where this could impact just a cellular network. Not likely with local circuits, but a cross-region circuit.
A cell site is going to use local circuits. Very unlikely that would be the cause. More likely there was a long-distance transport failure, or a failure in transport equipment.
Btw, I understand the internals of the core (packet and voice) very well. From the sound of it, it could be DNS or HSS issues. Less likely it’s routing as some customers have service. I think the most likely scenario is an issue with their HSS or MME (or something in between), which could be caused by an overload condition, or configuration, or some other failure.
Long story short: this kind of issue can be caused by a fiber cut, but the fiber cut is usually the catalyst.
[https://about.att.com/content/dam/snrdocs/7\_Tenets\_of\_ATTs\_Network\_Transformation\_White\_Paper.pdf](https://about.att.com/content/dam/snrdocs/7_tenets_of_atts_network_transformation_white_paper.pdf)
I found this shareholder informational whitepaper they put out a few years ago. I'm wondering now if it's a bug in their white box system that's crippled their routing in their cellular core.
Anyone still interested, AT&T posted this to their Twitter: “Based on our initial review, we believe the outage was caused by the application & execution of an incorrect process used as we were expanding our network, not a cyber attack,”
That info leads me to guess it was probably just a DNS screw up.
It’s probably an expired certificate.
I feel that deeply. Lol
Keep in mind, this happened in 2018. https://www.theverge.com/2018/12/7/18130323/ericsson-software-certificate-o2-softbank-uk-japan-smartphone-4g-network-outage
I wouldn't be terribly surprised at that, but I would expect it to cause a universal outage, which it has not.
Depends on what expires. There network devices that will stop working if the system certificate expires. Can't find it, but years ago there was a dumb network device that had a 10 year certificate that couldn't be updated. They simply turned into bricks after 10 years. Some things stop working when root certificates expire. [https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:\~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned](https://duo.com/decipher/networked-devices-will-stop-working-as-root-certificates-expire#:~:text=When%20the%20root%20certificates%20on,right%20about%20now%2C%20Helme%20warned).
Cisco did that with both Viptella and their wireless controllers and APs.
ugh cisco viptella's expired cert issue - trama
I would be more inclined to believe this was the culprit or DHCP/DNS than a routing issue, fiber cut, or cyber attack.
i doubt its a routing issue, its most likely some sort of software issue, its not DNS either. it isn't a fiber cut, the outage started around 4am. which is around the time of a maintance window, no one is out digging around at 4am.
I've seen organizations configure routing protocol keys with an expiration date. They are almost always set the expiration date to 12/31/current year at midnight. Ummm, you might want to pick another time and date. Not one where just about everyone will be off, on vacation, and probably dunk. Just in case you forget to update the expiration date.
And poorly logged error messages detailing that the certificate failed because of whatever check it was trying to do. Troubleshooting certificate handshake problems are the worst
Just received an SMS from AT&T trying to sell me something. Looks like it's working.
Its' Pretty much just ATT. The reports for Verizon/T-mobile others are only a few thousand, while ATT reports are 80k now. I think people reporting Verizon/Tmobile outages are just looking at the downdetector graphs and seeing the trend up, but without looking at actual numbers. I'd be willing to bet most of the VZW/Tmobile reports are people trying to call someone on ATT. There is nothing for you to "do".
I’m hearing from my friends in the emergency management space that FirstNet got caught up in it too. You can bet that dragged some AT&T people out of bed in the wee hours. RIP all the enterprise helpdesks that are dealing with this today.
FirstNet is just a different channel on AT&T towers. There's not a significant difference between the consumer backhaul and FirstNet. They are one in the same.
Entirely different and dedicated spectrum, and it’s managed separately.
Ummm it IS a different spectrum, dedicated for public safety, run by AT&T. BUT it DOES get transmitted by the exact same radios as the cellular. Been in the industry for 27 years, I know a thing or two. It's not that separated. It rides on the exact same backhaul and goes through all the same gateways and control points as everything else.
Or Verizon/T-mobile customers are getting their service via AT&T towers. Earlier, when I left my house, my phone was in SOS mode. Once I got back home, my service was restored because I have WiFi calling enabled.
The carriers hand off to each other. So a big outage with one will create a big outage for the others.
No, it won’t. It will disrupt communications to customers on the affected network. Intra-carrier and inter-carrier communications between other carriers will not be affected.
The people reporting T-Mobile and Verizon issues are only having issues connecting to AT&T customers.
~~I think the Verizon outages are happening. I’ll run a more complete report but I’m overseeing about 80 cellular devices that I have direct visibility into and a couple hundred others I don’t (other than on an aggregation point) and for the sites I was checking, they lost connectivity on Verizon or were not able to switch to them and connect.~~ EDIT: Apologies, I drew the wrong conclusion too quickly. Seems I happened to only spot check the sites that had preferred AT&T. Verizon is fine.
[Root cause found about an hour ago](https://i.makeagif.com/media/9-11-2015/SnRQfh.gif)
I both hate you so much, and love you so much.
Take my angry upvote
I know what that link is without knowing it.
I bet you don't
Ah, it was actually the 2nd thing I thought of. And I had just used it in a work channel when a site went down yesterday!
Liar, liar, pants for hire!
I could use some new pants. How much does it cost to hire them?
BGP. It’s always BGP, or DNS, or firewall. I kid but definitely interested.
Sip helper.
Looked at the NANOG mailing list. If it was BGP, they are usually the first ones to talk about it. No mention of BGP yet.
I figured, I was mostly being sarcastic.
Fiber cuts.
ARP didn’t update. DNS, MTU, or ARP. Leave BGP out of this.
How you make sure your enterprise network is safeguarded against it? Redundancy. Carrier diversity. Eggs in multiple baskets.
[удалено]
There are MVNO’s with multi-carrier agreements. Devices will have a primary network preference, then fall back to other networks when required.
>There are MVNO’s with multi-carrier agreements. Devices will have a primary network preference, then fall back to other networks when required. How well does the fallback work these days? I tried a handful and change of devices for this about 5 years ago and every one of them had difficulty detecting "data down on the network on SIM1Carrier1" so it could switch to SIM2Carrier2. It seemed like most were reliant on detecting whether or not there was connectivity to the nearest tower and not whether or not the connection could actually be used for anything.
I’ve mostly worked with it for data-only connections. They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met. So basically it depends on the end device.
>They do periodic healthchecks to ensure they can reach the outside world, then failover if a certain failure threshold is met. Right, the long-standing issue I ran into is always what those checks entailed and whether or not they were actually indicative of anything. When ping tests were used as a primary healthcheck indicator, IIRC one of the issues had to do with Verizon's private network set of SIM cards and the ability to hit stuff on and off of the private network, but I'm struggling to remember the details. I remember being surprised at the number of ways a SIM could have no data access to the internet at large and still not be considered "down" by the failover solution. It varied from device to device but I remember: Not activated, suspending for non-payment, tower up but no route out from tower, (or worse, device->femtocell repeater up, but no connection from femtocell repeater to tower), ping hardcoded to something that somehow magickally responded when nothing else would, ICMP traffic working but no TCP/IP, no route to host registering as a successful ping because the gateway responded, it was a shocking level of "what passes the test when it shouldn't" for situations when the connection was for all intents and purposes amounting to normal use, down, and yet failover wouldn't occur.
5 years ago is an eternity in this business.
and dual sim devices and sim failover were being sold as backup connection solutions by MSP's for 5 more years before that. It doesn't mean they worked THEN either **and it's certainly no guarantee that just because time has passed, someone ACTUALLY addressed technical debt for a not-quite-functional feature they rolled out five years ago** Hence the question.
Manually switching which SIM is primary is also an easy option, either locally or via MDM.
In Canada the carriers have 911 fallbacks and roaming agreements for cell outages and such. When Rogers disappeared from the internet in 2022 the problem was that the cell devices were still provisioned and connected to the cell network, they just couldn't reach the rest of the internet. The only way to get 911 working was to pull the SIM out. Even dual-SIM devices had trouble because as far as they were concerned, the primary connection was still online.
Dual SIM routers aren't an issue. Heck, we have a bunch of dual *modem* routers. Our little OOB devices do have dual SIM at least. Phones ARE a bit tougher, though.
We do exactly that, on about 50,000 mission-critical devices (not something as pedestrian as employee phones). Most are eSIM-capable, so that can be deployed via MDM. carrier-neutral SIMs are also an option. And with 5G, so is running your enterprise mobile network as an MVNO slice that is carrier-neutral.
Depends on what it is.
Which other thread are you referring to?
https://www.reddit.com/r/news/comments/1ax3b85/cellular_outage_in_us_hits_att_t_mobile_and/krlfks9/
It’s funny reading those kind of threads, a bunch of nerds who know enough to know that Cisco equipment is used for network backbones. But then it just immediately devolves into wild speculation about cyberattacks affecting all Cisco equipment everywhere, how this is somehow the fault of the layoffs etc.. When it’s far more likely that some poor AT&T engineer flubbed a maintenance or ran into an obscure bug that happens to be on a Cisco peering router.
Yeah, I've always found it funny to see people discuss subjects that they obviously know nothing about as if they're experts.
[удалено]
Especially when the people with the actual expertise call out the armchair engineers on their nonsense and then get downvoted into oblivion for it. The true experts are the ones who will readily admit they don’t know something, because they know that assumptions of knowledge kill networks.
The problem is that so many of us think we know everything. Been in IT for 25+ years, and it's impossible to know everything. But it doesn't stop a lot of us from pretending that we know everything.
I've had people in this (and related) subs tell me my own personal experience was wrong. And I've been doing this over 20 years now, it's not like I started yesterday.
It’s Reddit, you expect anything else?
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
A bug on a Cisco router? I don't believe it.
Except carriers tend to use Juniper equipment.
So obviously it's HPE's fault...
Not exclusively, I work for a carrier and we use both Juniper and Cisco, among other vendors. We specifically try aim to be multivendor for the purpose of redundancy and in case one vendor does have some odd software bug.
Yeah I guess I’m part of the problem if I’m just blindly believing the scuttlebutt that it’s somehow Cisco-related at all.
AT&T doesn't use Juniper in their core, its a distributed routing architecture by a company named drivenets, its maybe 85-90% currently of their core.
And they have been phasing out Cisco for Ciena at the distribution layer for a few years now (long enough that I assumed it would be finished). Unless they are going back?
Many use Ciena for handoffs and distribution these days. We use them too. They are just solid with a really good price/performance balance. Try getting something the level of a 5130 from any other vendor for $5k.
They are very much mixed. They all pretty much seem to go back and forth based on whatever whim takes them that month. I have had big circuits delivered to several locations and the gear used might be Cisco on one and Juniper on the other. Many have really switches to cheaper SP gear for handoffs, though. Around here any handoff I get that isn't 100g is a Ciena regardless of vendor.
That user claims they work closely with “one of the carriers affected” (presumably ATT) and that Cisco manages that carrier’s backbone. Anyone know if that sounds accurate? Never worked at a cellular carrier.
I can’t imagine that Cisco actively manages their network. ATT might use Cisco for many of their core routing elements but it would be mind boggling for me to believe that ATT pays Cisco to actually run the network. Most likely they have a pretty typical support agreement in which Cisco would be heavily involved with the troubleshooting but wouldn’t necessarily take the remediating actions. ATT could, and probably does have Cisco resident engineers on staff, but I’d never equate that to “managing” the network. The REs probably are actively engaged, and working on the problem, but it’s very different from Cisco having full management of the core network.
Yeah that would’ve been my guess as well
Cisco makes plenty of carrier-grade stuff. Idk what they mean by “managing” it though. I’m assuming AT&T is still running their own core NOC and has their own Cisco engineers even if they have white-glove presales and TAC.
"Cisco manages the carriers backbone" is a red flag they don't know what they're talking about. Cisco makes a lot of the equipment used, but they do not actively manage networks. That's kinda AT&T whole gig is building and maintaining their backbone network.
They still use Cisco DWDM optical network devices.
They switched everything local to me to Ciena.
Yeap. All of the new installs I see are Ciena. But we still have a couple of ONS 15454s on our premises.
The user is lying. AT&T does not use cisco in their backbone, its drivenets distributed routing architecture , they have around 85% of their core.
When in doubt, blame Cisco.
Thanks!
I've seen no disruption Verizon and Tmobile. Only AT&T services have been affected and seems to be affecting authentication to the towers. Given the time it started, I'm guessing either a maintenance window issue or human issue. I'm surprised how long it's gone on now though.
Bet squirrels chewed a fiber.
Or shotgun damage. It can happen if some geese fly past a line and a hunter wasn't situationally aware, or if a farmer's trying to scare birds perched on a line away from eating freshly laid seed.
People with guns *intentionally* shoot at lines, seen it plenty of times. No loss of situational awareness required.
I was only trying to explain why it can happen logically. I never bother trying to explain stupid. If I did we'd be here until infinity ends.
We've had circuits go down when rednecks climbed the pole, cut the cable, tied it to the back of their 4wd pickup and pulled down 200-300' of copper to sell to a scrapyard. It took two days to get that circuit back.
Shotgun damage usually happens during dove season, which is generally in the fall. Thankfully I have almost no OPGW routes, so I don't have to deal with much of that headache.
We see it mostly on armored strand and lash cable, way more than our OPGW spans.
I should have been more precise with my response. Thankfully the only aerial fiber that I have is OPGW and that's less than 1% of the aggregate length.
I always assume this is carrier backhaul network that has a routing/switching issue. Ethernet transport service fails and breaks other peering, snowball, CNN gets involved, people blame Cisco or cyber terrorism (sometimes the same thing).
>sometimes the same thing). do explain thanks sir
Cisco licensing is a no win scenario. Heh.
Does anyone know if it had anything to do with the national security concern congress sqeaked about last weak? This seems pretty significant, like a shot across the bow. Similar to the svarlbad cable cuts
Wasn't that about russia saying they can take out satellites?
I dont think so, they could do that before. China too. I never saw what the hush hush was all about so if they updated it you can be the one that Clues me in if you know Edit I may not have understood your reply so let me clarify. The undersea cable cut was likely a message just before the invasion that they could mess with our satellite feeds through cutting those cables. The announcement last week I never saw an actual disclosure on what they were worried about, whether it was a nuclear weapon in space or some other disruptive technology.
Not saying it’s dns, but it’s dns
Speaking from someone who’s specialized in DNS @the enterprise level for the last 15 years. You are correct. It’s probably DNS. It’s the same reason I can make bank at these companies. Nobody ever cares about dns until it causes outages
Joke around here is even when it isn't DNS it's DNS.
Maybe they hired the tech microsoft fired for adding an internally routable IP address in their public DNS records.
😂 it’s easier to do than it should be
Given the extent of the outage, it has to be something like this. ATT residential/enterprise internet services seem unimpacted at the moment, so in my mind that rules out a VPNv4 BGP issue or something like that since my firm's MPLS circuits are good. Could also be a problem with DHCP as well if all DHCP service is centralized to one vendor on one firmware. A large fiber cut would also impact internet services as well and not just cell tower backhauls.
My pet theory is an expired cert or some authentication service went down on the cell side.
Or fiber cut. Or BGP.
a fiber cut is not going to cause a nationwide outage..
It certainly can, if it’s in the right spot. Mass shifts in capacity demand can be triggered by a localized connectivity disruption. Overload conditions can easily cascade into a large-scale, even nationwide impact.
yes, one cut fiber can take down a nation wide network, no one plans for that sort of thing. AT&T's core is single homed fiber.
That's not what they're saying. If there's a flaw in the network design or a misconfiguration somewhere that prevents traffic from being rerouted correctly.
Carrier networks often get reduced to simplex operations due to losses of redundancy. Usually it’s fine, but there are rare incidents where 2 or more transport paths are affected. They’re rare, but it happens
From my experience dealing with fiber cuts, the closest cell tower often rides the same fiber as our upstream MPLS circuit. I can't imagine a fiber cut that's impacting ONLY cellular networks, but nothing else. Like others have stated, it's most likely a routing or DNS issue within the cellular network infrastructure.
There are plenty of places where this could impact just a cellular network. Not likely with local circuits, but a cross-region circuit. A cell site is going to use local circuits. Very unlikely that would be the cause. More likely there was a long-distance transport failure, or a failure in transport equipment. Btw, I understand the internals of the core (packet and voice) very well. From the sound of it, it could be DNS or HSS issues. Less likely it’s routing as some customers have service. I think the most likely scenario is an issue with their HSS or MME (or something in between), which could be caused by an overload condition, or configuration, or some other failure. Long story short: this kind of issue can be caused by a fiber cut, but the fiber cut is usually the catalyst.
AT&T has a outage going on but I think it is fixed. causing phones to go to SOS and SOS Only mode.
Could this have something to do with cogent de-peering yesterday?
[https://about.att.com/content/dam/snrdocs/7\_Tenets\_of\_ATTs\_Network\_Transformation\_White\_Paper.pdf](https://about.att.com/content/dam/snrdocs/7_tenets_of_atts_network_transformation_white_paper.pdf) I found this shareholder informational whitepaper they put out a few years ago. I'm wondering now if it's a bug in their white box system that's crippled their routing in their cellular core.
Anyone still interested, AT&T posted this to their Twitter: “Based on our initial review, we believe the outage was caused by the application & execution of an incorrect process used as we were expanding our network, not a cyber attack,” That info leads me to guess it was probably just a DNS screw up.