Telco Network Failure: Optus 2023 & More

Sat dish? Mate, mainstream mobile phones available today can already communicate directly with the satellite network. It works.

So in the normal course of events, the mobile phone communicates with towers etc. etc. etc. but in an emergency it can communicate directly with a satellite, and the only working telecommunications infrastructure needed anywhere near the person is a mobile phone with a charged battery - and the need for a charged battery pretty much applies to any way in which you might improve the resilience of the mobile network.

2 Likes

Very true. It’s probably not reached many phone users yet, and is really an emergency option. So far. Who knows what the options will be in a few years.

1 Like

Mental xxxx. Mindset was creating a replacement for the tower via satellite. Would not a transition to phone-direct sat services require everyone to have a new/recent phone, regulatory system and telcos willing?

If government subsidised them it might be much cheaper than an overhaul of everything mobile service, and less costly than what is being tossed about to encourage water, power, etc savings?

Mea culpa.

1 Like

Which is a perfectly reasonable way to quickly re-establish connectivity for an area where fixed mobile towers have all been damaged / disconnected by fire or flood. Emergency connectivity kits consisting of portable mobile towers that connect via satellite and can be set up anywhere at short notice.

Yes, some recent phones can connect via satellite, and all the major chip manufacturers have versions of sat connectivity either on the market or available soon. Not all new phones yet have satellite connectivity, though, and the ones that do tend to be the pricey ones.

The Motorola Defy Satellite Link would be more affordable for those with shallower pockets. It’s an external device that attaches to an existing smartphone and works with iOS 14+ and Android 10+ phones.

I’ve recent real world experience using an iPhone 14 - QLD Brisbane to Townsville Bruce Highway, Gregory Development Rd and other inland routes. The iPhone had a full fat Telstra sim post paid and Optus network MVNO. Used a dash mount.

Note the following is anecdotal and the observations of one only. The emergency satellite service signal did appear when there was no mobile coverage from either network. Despite most of the inland areas being in open rolling country the satellite signal was reported intermittently. It may be the future. Device and real world testing required.

2 Likes

According to the article above,

the challenge is antennas. The aim is to get around the thick bulky antennas used on current generation satellite phones.

Apple itself achieved this in iPhone 14 through a complex antenna redesign, although users had to point their iPhone in the direction of an available geostationary satellite. The antenna has limitations.

Might that explain why your iPhone was seeing the satellite signal intermittently?

1 Like

Hmm. A typical mobile phone connecting to a geo satellite using an internal antenna?
I assume that is a typo in the article. Those satellites are over 35,000KM away.

Not only would your phone need to very directionally held in a very specific direction, the transmit power would be substantially more than your normal phone.

Maybe able to send a short burst of messages before the power ran out.

1 Like

The app helps you to do that.

Also, the satellites are not geostationary. They are Low Earth Orbit (LEO), which means they are much much closer (than geostationary) but means that they move.

I believe so - although I would call it a braino.

2 Likes

Without an independent review and field test it may or may not. If it does need additional consumer actions. One more memory burden for the last of my grey cells to retain and recall. More critically it assumes one is in a position to be able to implement the required actions.

Should the marketing be very very bold and clear satellite connection is not assured? The ability to connect through a satellite in an emergency needs in my mind more than a small asterisk and qualifying line of fine print.

There is more to it. What “can use” says to the average consumer should not be too difficult to interpret.

3 Likes

Any tower taken out by a disaster will remove whomever telco service that uses it from availability. Some are already shared, but if we are talking more rurally then there may currently be only one provider anyway. Just because a tower has a number of telcos using it is a bit of a furphy that nationaliised infrastructure will be more impacted. If there are three different telco towers not one then the loss of towers is more than likely to be tripled. I would think that many more towers or fibre to communities will be required to reduce congestion, remove blackspots, and increase decent rural/remote coverage (even in unprofitable zones). I don’t think that many telcos have an interest in providing any infrastructure in unprofitable areas, Telstra have an interest as they get funded to provide some connection which can be quite basic connectivity.

Why does a tower have to be fossil fuel serviced? Can’t it be renewable powered? Still, if power is lost to an area then any services that rely on that power supply fail. Perhaps design of infrastructure needs to be built more to the possible conditions rather than to the slimmest budget, particularly as we suffer more impacts from climate change. In regards to using renewables to power telecom equipment, this occurs a fair amount in remote locations. I have visited communities where the only phone was a payphone and it was entirely powered by solar panels on the roof of the box, this then connected with a nearby microwave tower. Nowadays it is a mix of macro cell tower, small cell installation or a suitable proximity to macro cell coverage (still all these are towers of some sort), often the provider is just one i.e., Telstra but in some cases it also includes OPTUS. Just to note in 2021 the number of villages, communities and towns (this does not include Darwin, Palmerston, Tennant Creek, Katherine, Alice Springs, Nhulunbuy and Jabiru) covered by this 3G/4G availability was 72 and 4G in 2022 it was 187…still far short of most remote locations that have residents and many do not have connection to power so they rely on gen sets and/or renewables if they are to have any power.

If the tower has good clearance from potentially flammable material it will often survive bushfires, cyclones can be a different matter and floods depending on elevation, the tower and it’s power supply may be impacted.

Satellites suffer a huge disadvantage when line of sight (LOS) is obscured, this obscuring can come about from a few environmental conditions including smoke, sleeting, snowing, and rain. So in the circumstances of some natural disasters LOS will be lost often during the most dangerous phases when communication is vitally important and may continue for days after the initial disaster. After a disaster there are mobile solutions that can be placed in areas to supply connectivity if infrastructure has been so impacted such that it is offline. What I am trying to say is that there are always circumstances where 100% uptime cannot be guaranteed, having a backup system reduces the possible impacts but does not entirely remove them.

4 Likes

This causes one to wonder how the routing priority for a 000 call goes when there is a ‘no service’ status for the SIM. It was not a ‘no signal’ spot as evidenced that ‘a few people pulled over and were able to contact triple-0’. Is it a case that it doesn’t work as expected and advertised, or was it an anomaly? Does anyone know if and how they test it?

3 Likes

I don’t but I do know that it is difficult to test. :slight_smile:

I think the government should be putting pressure on the three mobile networks to get to the bottom of this (rather than Senators showboating in the Senate). I have a theory but that is not much better than pointless speculation.

For completeness of continuing the story: Optus CEO Kelly Bayer Rosmarin resigns 'in the best interest of Optus' following nationwide outage - ABC News

and Former CEO Kelly Bayer Rosmarin's resignation a 'sacrifice' that does little to benefit Optus, says national anti-corruption commissioner - ABC News

2 Likes

I would think that the emergency call provisions of the GSM mobile network would be very easy to test.
Take any mobile phone sold in Australia, take the SIM out, or leave it in, and dial 000 or 112, which in Australia translates to 000, and see what happens. Now that may well annoy the emergency services if everyone tried it, but authorites could and should be testing this.

The general rules are:

  1. Connection to home network provider available on a reachable tower, then connect via that.
  2. If the first fails, then connect via roaming. I don’t think the Australian mobile networks have roaming agreements between themselves.
  3. If the second fails, then connect to ANY network available for emergency calls regardless of any SIM presence or not.
  4. If the third fails, then you are stuffed. There are no networks available.

Test 1a: Require having ‘test 1’ fail, then on to test 2.

When there is a sim and network connection, the emergency call will be through the connected network. When one is out of coverage for their network provider, the call goes through any network which has coverage (noting if there is a blackspot with no network coverage, calls can’t be made). If one has a phone without a sim, emergency calls can still be made through any network which has coverage where emergency call is made.

Listening to an expert late last might, it appears during the Optus outage, mobile phones continued to be paired to an Optus mobile tower, thus making the phone believe it could call through the Optus network. With the outage this wasn’t possible. Call attempts would have been made through the Optus network, but failed. The emergency call system isn’t designed to try other means of connection when one fails, such as in a second attempt to try and call through an alternative network where there is coverage. Removing an Optus sim may have worked as it would have disconnected/unpaired the phone from the local Optus tower.

2 Likes

‘isn’t designed to’ might have been discovered or at least documented if they tested 000 under all circumstances and firmware upgraded for Australia to assure it works.

My understanding is that in countries where roaming is implemented a ‘loss of tower’ causes the phone to search for another accepting network. If that is correct the ‘solution’ for avoiding another debacle should not be difficult to achieve, corporate and government intrigues aside.

From my understanding of what I heard last night, it wasn’t a loss of tower event. Phones were still pairing to Optus towers. Somewhere between these towers and connectivity with the outside world was broken. Automatic ‘roaming’ like in other countries where a provincial or local state network doesn’t
have any coverage (viz. such as in Canada where national network operators share with provincial operators to allow service interstate) wouldn’t have worked as at the user end the phone would have had connectivity to the Optus tower. This tricked the phone into thinking network access was possible.

It would require major redesign of phone OS and mobile networks to allow different call attempts when one fails. Some would be phone software and other would be network access allowances. Example being, first attempt trying calling through paired tower (fails through outage somewhere, congestion or poor signal), then automatically second attempt say through data connection with the paired tower, third automatic attempt through another network where the phone identifies a signal and say fourth through satellite or other means (such as public WiFi) if it exists.

1 Like

So what is meant by “the root cause”?

From the ABC:

Two days later, when Optus chief executive Ms Bayer Rosmarin faced questions from a Senate inquiry in Canberra, she said any reports blaming Singtel for the outage were based on a misunderstanding.

She instead blamed routers used by Optus, which were built by American technology company Cisco, for shutting down after Singtel carried out a software update at one of its internet exchanges.

The ex-CEO said:

“The root cause of the issue was that Cisco routers hit a fail-safe mechanism, which meant that each one of them independently shut down. That was triggered by the upgrade on the Singtel international peering network,” she said.

“That was misinterpreted by media as the root cause being the Singtel upgrade. But the trigger was the Singtel upgrade, and the root cause was the routers.”

This is just playing with words. Neither the software upgrade nor the Cisco routers on their own caused anything.

The root cause, as I understand the phrase, has two parts; firstly, nobody realised that the software upgrade would trigger the router fall over and secondly that they proceeded without testing it adequately (or at all).

This was the responsibility of either Optus or Singtel whoever manages their upgrades, not Cisco. I reckon Cisco may have said some unkind words to them behind the scenes.

To add fuel to the fire, having worked out what went wrong it took far to long to restore. This was once again an in-house problem.

Does the board really think this kind of bulldust helps them at all?

2 Likes

Agree the root cause is not as publicly suggested.
I favour there being at least one further layer behind the question as to what if any testing was done?

If Optus proceeded without testing or that the testing was inadequate, “Why was it so”?

Someone or group made a decision based on a risk assessment of the technical advice, knowledge and consequences if it went wrong. There’s a question as to whether that process of assessment and decision making was faulty (or did not exist). There’s also a question as to the corporate culture and whether the values reinforced by management skewed/rewarded the decision making process towards acceptance of greater or lesser risk.

Was the decision and call for the upgrade made out of Singapore?
What if any input did Optus Australia have to that process?
The follow up response to restore the Aussie services is a different question.
Did Optus Australian staff do exactly as they needed to, and were the protocols followed always going to take the time that it took?

Whatever the real answers are - looking elsewhere to a new broom is unlikely to offer the same outcome as a deep dive into the organisational management and cultures of Singtel and Optus.

Not major but, yes, would require a change on the phone, as I suggested at the time.

Some improvement could occur on the network side if the ‘tower’ could shut itself down, for the particular mobile network operator, if it loses effective backhaul connectivity - although there could be some negative implications of that.

That is never going to be a complete fix because the assumption is that there is a malfunction in the network of one mobile network operator. The complete fix is on the phone.

Possibly but bear in mind that as of inside a year all 3 mobile networks will have shut down their 3G network and all voice calls will be over the data network anyway (VoLTE).

It wouldn’t have to be sequential. As long as you are outside and have clear skyview, using automated technology like AML but with the data packet sent via satellite, some kind of emergency message can be sent via satellite automatically regardless of how you are going with trying to call 000.

It’s more complicated than that, AIUI. Singtel did a software upgrade on their network (which may or may not involve any Cisco equipment at all, and may or may not have been advised to Optus or any other network operator). That triggered a bunch of routing changes that propagated from Singtel’s network to Optus’s network (and presumably to other networks as well). The routing changes caused the Cisco routers inside Optus to panic and disconnect from the network. This was because of incorrect configuration by Optus of the Cisco routers.

The root cause was: the incorrect configuration.

My understanding is that a config parameter of the Cisco routers was at its default value. For an organisation the size of Optus it is their responsibility to RTFM and consider whether the default value is appropriate for their environment.

Cisco was quick to make public their disagreement with Optus regarding the characterisation of the failure.

I have personally experienced the Cisco panic problem. It used to occur for us at very inopportune times of day when IT staff should be asleep in bed. While the particular panic was different for us, the result was the same: the Cisco device shut down ports, thus breaking the network at that point. (Another difference is that for us any given Cisco device would shut down in isolation, independently of any other Cisco device, whereas the Optus experience was all Cisco core routers shutting down simultaneously, which indeed must have looked like a cyber attack. :open_mouth:)

There may be some merit in that claim. It is possible that they had reduced staff levels, particularly at the highest level of knowledge of the network, in the “remote sites” i.e. too much centralisation.

Another observation, now that we know that the equipment is Cisco, is that some Cisco devices have a dedicated management port. You are supposed to network those management ports together and to the control centre so that you don’t lose remote access even if the underlying network falls over.

Latest commentary, a fairly good timeline: Two weeks since the Optus outage, documents show backroom scrambling and urgent meetings occurred as the emergency played out - ABC News

1 Like