Online Service Failures

It would be interesting to document the impacts on people who depend on online services to always be up and available and working properly. Here goes -

There are more than a few posts about banking systems going down leading people to carry some cash or having accounts at multiple banks as insurance against being unable to pay; rare as it might be it happens.

A tweeted apology to those affected surely got them into their vehicles quickly if they did not carry their backup ‘key’. (Tesla drivers were not entirely reliant on the app. “There will be a secondary mechanism to get in or out of the car beyond the app, the difficulty will come for drivers if they are not carrying it”).

Tesla drivers might be more attentive to having their ‘backup’ with them to get into their cars. Seems similar to the NBN admonishing everyone to always have a fully charged mobile for when the NBN goes missing.

10 Likes

When I had a real job, on occasion I was tasked with doing disaster analysis and recovery plans. Basically, the analysis was: nature of event, probability of occurrence, impact, cost of mitigation.

Nobody should think online services will “always” be up. An acquaintance designed software for telephone exchanges. Their criteria was to reduce outages to less than 20 minutes per century. Close, but not “always”. There is a small probability that a solar flare take out most communication, as happened in 1859. The message is clear, have a strategy to deal with not having online services.

100% reliability is unattainable. I did the analysis of a company computer failure (e.g. a fuel truck running into the side of the building and blowing up). Highly unlikely, but the analysis showed that a makeshift center could be set up in less than a week buying replacement equipment and restoring as much as possible from offsite backups. The possible losses were huge, but the probability was low so not big enough to justify having a redundant computer system offsite. Banks don’t have that luxury and do have redundant sites.

For many of us, lack of online services is an inconvenience. Often, the least cost strategy is to wait it out. Where there is a chance of real loss then analysis is beneficial. What loss, what chance. I know one person who has NBN, 5G wireless and skylink (Elon musk satellites) connection and a router that can merge these to provide an almost unbreakable service. This is extreme but if your livelihood depends on unbroken service, perhaps justified.

4 Likes

I worked for one of the big four banks, at its primary computer center, and this did almost happen. A fuel tanker jack-knifed, overturned, and started leaking fuel right outside the center. We were ready to evacuate and switch all processing to an alternative computer center within 15 minutes.

This DR scenario assumed that all facilities would be gone. Including all technical staff and managers.

3 Likes

I used to work in IT and I would never, ever assume that IT services will not go down. It always has happened and always will happen.

Look at the recent Facebook outage. A misconfiguration of a core service prevented access to their sites, and no number of redundant data centres would have fixed that. Google have also had service failures, and their DR strategy assumes that entire countries may go offline.

Yes I always carry cash in case the EFTPOS is down. I also have a small emergency stash available after my wallet and phone were stolen and I had to wait for replacement cards, borrowing from friends to get by.

6 Likes

Speaking as a Tesla driver - I did not notice the remote server outage at all. When I read about it, I checked back on date & time it happened and found we had been driving (and charging) the car with no problem.
We very rarely use the Tesla mobile app to unlock the car or to ‘start’ the car because we use the key fobs. One of the reasons for this is that Tesla (and common sense) has told us using the app on a mobile phone will not work when the car or phone or both are in an area with no Telstra mobile data coverage.
The Tesla server outage did not stop Tesla fob transmitters nor Tesla RFID cards from working as they communicate directly with the car without going through the Internet to a remote server.

8 Likes

Online services such as banking, government or retail that have outages rarely cause an issue for me, as I generally have the luxury of waiting until the service is available again. My issues is when either my ISP or NBN have service outages as I’m a remote contract worker. I had an NBN outage last week for an hour which prevented me attending a regular scheduled Zoom meeting with a client. After going through the usual painful troubleshooting process with my ISP who determined it was an NBN issue, I went off to have lunch. Thankfully the NBN service was back up when I returned from lunch. I have a backup in case of a prolonged outage, namely a mobile broadband WiFi modem, which I recharge when I know the outage can’t be resolved quickly. This has “saved” me a number of times over the last 3-4 years.

5 Likes

Did you also have plan B if Zoom was down? eg Skype or FB or similar?

2 Likes

In this case I wasn’t running the meeting, so there was no backup option. The meeting went ahead without me. I do have backup options if I’m organising meetings namely Google Meet or Microsoft Teams.

4 Likes

Does anyone know if the various Tesla key systems work at the parking area of the Black Mountain Tower facility in Canberra. The strong signals put out by the tower often prevent RF and other signals working, and the NRMA Road Service carry RF shields that they can deploy between the transmitter and the car to allow people to get into their vehicles.

To join the queue, I too worked in DR planning in an organisation that had to have 99.99% availability 365.25 days a year. I too had to develop strategies for the most outlandish scenarios, just in case they happened. This included those discussed by others, floods, fires, meteor strike, terrorism, even sabotage. We created both hot (up & continuation of business in under a maximum of an hour) and cold back up sites.

It’s all about the risks you are prepared to take, compared to what you have to deliver. Unfortunately, private enterprises (apart from the financial institutions) tend to overlook Disaster Recovery Planning. This lack of planning has ruined many businesses that could not recover from a relatively minor calamity.

1 Like

We have driven our Tesla to the parking area at the base of the tower on Black Mountain. Our car’s key fobs worked perfectly - battery operated key fobs (as did the car).

2 Likes