Shift Left to Get Resilience Right
Testing code before a software or application release is basic to the process of software development. But, whereas testing was once the culmination of development, over the past roughly five years, testing has shifted left. That is, testing now takes place earlier in the development cycle because it simply makes sense to discover and fix bugs and errors early, before they’re ingrained in the code. Not only is it easier to do so in the initial development stages, it’s also a lot less expensive.
Today, the shift left sensibility, whose mantra is “test early, test often,” is standard to software development in CI/CD pipelines. In our recent webinar entitled “Shifting Left on Cloud Infrastructure Availability,” we made a strong case for adding resilience testing to the series of tests developers (and QA) continuously conduct before releasing a version. Our rationale for testing resilience early parallels the idea behind development’s shift left – simply, so enterprises can validate the resilience of their cloud infrastructure as changes are made and new applications enter the environment. And, by the way, despite the claim of responsibility for uptime being shared with the cloud provider, it ultimately falls in the lap of the enterprise to assure service availability.
It’s a complex challenge for enterprise IT teams to maintain resilience in the cloud. The enterprise may have hundreds of applications running on multiple public and private clouds. And, a notable aspect of today’s cloud-hosted business is the dizzying frequency of new features, capabilities or services released by cloud providers, at times 10 or more announcements per day! Multiply that by the number of cloud providers used by an enterprise and the implications for their IT environment of keeping up with that number of changes and updates present a challenge.
But the number of new features is only part of the complexity. There are also the many different IT hands that go into the mix of maintaining an enterprise’s environment – in-house teams, and those of vendors and solution-providers, introducing an additional dimension to the challenge of resilience. And, in modern architectures like microservices it gets even more complex.
With respect to resilience assurance in cloud environments, we understand now that the kind of complexity that exists can lay the groundwork for misconfigurations and single points of failure leading to service outages and/or security risks. That’s why it is only logical to extend early and continuous automated testing to resilience.
What we’re actually talking about is proactive resilience testing. Considering what’s at stake is the enterprise’s most critical data and availability, such testing should become integral to the CI/CD process, particularly since application development and delivery take place in very short cycles and there’s no room for unforeseen bugs or misconfigurations that cause unavailability.
Automatic, early resilience testing is a rapid and revealing process. In a future post we’ll provide some examples of the types of misconfigurations and other errors discovered through early resilience tests. If they would have remained undetected, they would have led to disruptions, outages and security risks.