Banks are having a particularly difficult time with their public image. A recent Gallup poll of Americans’ confidence in 14 key institutions revealed that confidence in banks dropped by 22 percentage points between June 2006 and June 2016. By comparison, confidence in other major US institutions such as Congress, newspapers, television news, and organized religion, dropped by around 10%.
Seeking to improve their bruised image following the 2007-2009 US housing bubble and financial crisis, banks invested heavily in customer service and online services. 24/7 online banking is now the norm, with an ever-increasing list of services on offer. However, several very public service outages have put banks under pressure again. For example, HSBC UK’s reputation took a beating when the bank’s website went down twice in 2016, and customers took to social media to voice their discontent. Other banks have experienced similar service outages that have cost the industry millions of dollars.
The Challenge: Complexity and the Unknown
Big IT budgets are not enough to prevent service outages. The reason lies in the pace of technological change and an ever-growing infrastructure complexity. These have outpaced the capabilities of IT teams and their ability to constantly keep up to date with new systems and services. Often, new systems pose compatibility issues with existing or legacy systems. But with thousands of bank server files controlling service configurations, even the best resourced, managed and intentioned IT teams cannot keep track of them all.
Test environments are not the solution, because typically they don’t encompass all the legacy systems and infrastructure layers that must be integrated with the new services. This means that interdependent functionalities might only be tested once the new system goes live. Not a good approach.
To make things even trickier, often the causes of service outages are unknown. Case in point is a recent University of Chicago study of unplanned outages. Of the 516 cases studied, 48% had unknown causes. It could have been misconfigurations, malware, or incompatible software.
Not knowing what is causing a service availability issue is a recipe for repeatable failures; and is not a good place to be if you are a major bank, and subject to more regulation and public scrutiny than other businesses. And with the rate of technological change showing no sign of slowing down, IT teams will continue to struggle to ensure a risk-free infrastructure that is configured according to industry best practices.
From Reactive to Proactive. From Unknown to Known
Identifying the cause of unknown service outages in complex and large scale banking environments calls for a solution with a quality assurance mindset. In order to be effective, the environment must be checked as frequent as the rate of IT changes within the environment. Since testing these complex environments requires huge resources, and in most cases even interferes with the operative nature of the system, only a non-intrusive and automated solution can validate the systems’ resiliency upon each change.
On top of that, there is a need for system-wide visibility across all layers of the infrastructure, coupled with the ability to continuously and proactively identify resiliency risks. An automate solution – based on a dynamic knowledgebase aligned with vendor and community experience and recommendations – coupled with an internal process is the best recipe to mitigate IT risk. The process may differ from one organization to another, but will allow IT to improve operational excellence by moving to a proactive mode. Identifying and promptly correcting the misconfigurations before they impact critical business services is not only a key to running a more efficient IT operations, but also win back the customers’ hearts.