In 2018, we saw that complex, hybrid IT systems are hard to maintain especially when you don’t really know why they fail. There is a solution!
Well, 2018 is just about over and at the end of the year it’s good to take stock of what this past year was about. One way we’re doing that is by looking at the blogs we published. What were the issues that concerned us and the community we serve, what was our take on significant events that occurred and now, what have we learned?
Some mega-outages in 2018
With IT outages figuring prominently in our blog posts, we were both critical and understanding. Critical of enterprises whose online availability is vital to hundreds of thousands or even millions of people (banks, airlines, and don’t forget Black Friday!) – for not knowing what their resilience status is and whether they’ll be able to recover from an outage or a cyber attack. And, we were also understanding, knowing how complex today’s IT environments are – they are hybrid, comprised of public cloud, private cloud and on-premises systems, new and legacy applications, maintained by numerous in-house and outsourced IT teams, etc.
New technologies can disrupt online availability
We also posted some blogs that discussed tech aspects of keeping IT environments resilient. We pointed out that with new cloud technologies the pace of innovation is dizzying. We also looked at upgrading to higher-performing systems such as software-defined storage, and another post talked about how so many systems are commissioned and decommissioned on a weekly basis at the typical enterprise. In these and other posts, we cautioned that any and every change in the environment must be validated to ensure that the old and new connections and paths required are (still) in place and operative. No matter what or who initiates the change, the responsibility for resilience is on the enterprise.
Enterprises generally don’t know the reason for their outages or disruptions
Still another post waxed philosophical, asking where the stories we tell about the causes of IT outages originate and why we don’t or can’t go more deeply into the real reasons.
We see that the thread running throughout all of this year’s posts is the realization that enterprises don’t know and/or don’t have the time to find out what caused their outage. It’s only logical that if the true cause remains unknown, then the fix administered may not be the real repair needed, and another outage is only a matter of time.
Thus, enterprises pay a heavy price for not knowing what’s going on within their IT environments. Though it’s not an uncommon problem, “not knowing” won’t get you a discount when the regulator comes around to demand the fine for service disruption – a growing phenomenon in several spheres (and a topic we also referred to this year). Nor will it prevent the board or customers from demanding the CEO’s resignation, unfortunately.
IT resilience is definitely and demonstrably achievable. Our AvailabilityGuard™ solution provides visibility of the resilience status of your complex, large, interconnected hybrid IT environments of today. It identifies potential points of failure and how to repair them. Multiple reports and a clear, intuitive UI ease the work of maintaining resilience. AvailabilityGuard gives IT teams the information they need so that they “know.”
Here’s wishing that 2019’s a year of discovery for enterprises in which they find and implement the solution to their IT resilience challenges.