The IT world is quickly adopting a cloud-first strategy. The benefits of the cloud are clear: significant savings in hardware deployment, agility, scalability and elasticity, “guaranteed” uptime, and being able to offload responsibility for operations (thus freeing up resources for organization projects).
Another popular expectation is that the cloud will enable developers, DevOps and IT infrastructure engineers to focus on building and deploying sophisticated applications and services with increased flexibility, scalability and reliability, and at the speed of light.
In order to enable and support this kind of innovation, all major cloud providers keep pushing new improvements, innovative features, capabilities and services at a pace we’ve never seen before. For example, in February Amazon announced that in just one quarter 497 new features were added to the AWS platform. These were part of the significant, new 1,403 features added in 2017, up from a total of 1,017 in 2016. This translates to almost 4 new features a day!
How do you keep up with such a huge pace of innovation?
Yes, there were some concerns about how all these new features are complicating the cloud, but in general it seems like we all want to be at the bleeding edge of innovation and agility.
One important aspect of keeping up with this level of change is how to assure resilience in such a fast-paced environment? The problem is that every change might have an impact (big or small) on your resilience strategy, and someone needs to (1) understand what that impact is; (2) translate that for your specific environment; and, (3) make the necessary changes to keep resilience intact.
A notable example of the resilience challenge is the recent announcement regarding Microsoft Azure Availability Zones. Their main purpose is to offer the “most comprehensive resiliency strategy.” Yet, in order to implement it in your environment you need to understand the set of best practices and vendor recommendations, revisit previously configured services and understand if and how to adapt them in order to support new capabilities, and for a change of this magnitude you practically need to re-architect and re-configure your entire resilience strategy to enjoy the new capabilities it offers.
In dynamic environments risks to resiliency change all the time
Not every new enhancement has the same impact on resiliency, but every new feature may require you to revise your configuration, and every such change may introduce a potential new risk to resilience.
Cloud providers will continue to offer a rich and sophisticated set of building-blocks for forming a resilient infrastructure – but the responsibility to use them wisely still lies with the end-user. So, keeping pace with this explosion of new capabilities and making sure they are all configured for resiliency – is sort of like “chasing your tail.” Realistically, only a resilience assurance automation tool like ours, which utilizes deep knowledge, is driven by the community and employs AI / ML algorithms is capable of handling this challenge.
Want to learn more about AvailabilityGuard for Public Cloud resilience?