Backup systems, redundant systems – call them what you like but have them in place so you can failover in milliseconds if your main system goes down! Unfortunately for the hundreds of thousands, or probably millions, of air passengers in the past few months who were hit with the brunt of failed systems, this warning wasn’t always heeded or when heeded, not always beneficially, by the airports and airlines they patronized.
A sampling of outages. Note: No backup systems took over.
The distant past: December 2017 – The busiest airport in the world during the busiest time of the year, Atlanta’s Hartsfield-Jackson experienced an 11 hour electrical outage caused by a fire that led to the cancellation of over 1,400 flights and thousands of direct and related delays.
More recently, in March 2018, a “technical issue” at Sydney Airport in Australia closed down the airport’s passenger processing and security screening systems were out.
Also in March, Air Canada’s airport systems, check-in and customer call centers were down for several hours throughout Canada due to an unknown “network-wide computer outage.”
And, at the beginning of April, a “system-wide tech failure” of the system which monitors air traffic caused the delay of close to 15,000 flights across Europe. Affected airports included Schiphol in Amsterdam.
Officials at Schiphol Airport must have been crying, “Why me?” when their airport was affected once again in April due to a power outage that forced it to shut-down for several hours.
Not always clear why outages occur. What’s clear is that planning and prevention are needed.
Does it seem unusual that when huge outages such as these are IT-related, the reasons for them are rather vague? Actually, it’s not. The reality is that with all the interdependent physical and virtual systems they need to attend to, the true cause of such outages is hard for IT teams to pinpoint because they often lack visibility into these systems. And, with the intense pressure to get systems back up after an outage, IT teams don’t have the “luxury” of investigating the root cause. Don’t forget – it could be that their system didn’t failover even though they had backup systems in place, or that redundancy didn’t extend to all their systems or in fact, there was no system in place ready to take over.
What about the electrical outages that brought chaos to airports? Here, too, the principle of redundancy applies. And, when planning a redundant system, it helps to envision a worst-case scenario. For example, there was indeed such a system in place in the Atlanta airport however, the switches and cables of both the main and backup systems flowed through the same piece of equipment – and it caught fire (the worst case…). In contrast, in Dallas, electricity flows into the airport from two sets of power lines coming from two different locations – this, in the effort to prevent outages. Nonetheless, a backup system would be critical there too.
The fallout for airlines and airports of not having reliable backup systems in place multiplies considering that data may have been lost during the outage and may not be fully recoverable.
Then, there’s the financial aspect, with hundreds of millions of dollars in lost revenue from cancelled flights, costs to get running again, and compensation to passengers.
Yes, the passengers, with accumulated loss of millions of hours and missed connections, missed meetings, missed vacations, etc., etc.
Keeping critical systems resilient is vital and is most definitely an achievable goal. Airlines and airports need to do their homework and find the resilience solution to invest in that best meets their needs.