Configuration drift. It’s one of these things that you know and fear, yet you kind of shrug when you hear it. “There is not much we can do about it, so we just learn to live with it.”
Configuration drift a naturally occurring phenomenon in data center environments as a result of ongoing changes to software and hardware. It is common for a configuration drift to occur when hardware or software configurations in the production or primary environment are modified, deviating from the configuration of a recovery or secondary environment.
Once a configuration drift occurs, ensuring the stability of the environment becomes extremely difficult. As a matter of fact, many high availability and disaster recovery system failures can be traced back to configuration drifts.
Old Problem. New Consequences.
While configuration drift is not a new problem, its magnitude and impact are growing dramatically as the pace of change escalates across all IT environments. Virtualization, which masks certain configuration setups, adds yet another level of complexity.
With the growing sense of urgency, there is renewed energy around the rally cry to prevent configuration drift. Automation tools such as Puppet and Chef are promising to make configuration drift a thing of the past. But as our very recent survey shows, the adoption of software-defined automation tools is far from being widespread.
Since configuration drift is not drifting away any time soon (excuse the pun and the skepticism), we must find better ways to do the best we can for now. This doesn’t mean we shrug and continue with business as usual. If we cannot avoid configuration drift, we can still take a proactive approach to minimizing its spread and impact on infrastructure availability.
Three Ways to Identify Configuration Drifts
There are several ways in which configuration drifts are typically identified today:
- Something bad happens
If you get to keep your job, you just fix it and wait for the next outage to happen…
- We got lucky
As we keep tinkering with the environment, we can stumble upon a visible configuration drift that we may be able to fix before something bad happens. The problem is that in many cases we may introduce additional configuration drifts in the process…
- Periodic test/audit
The conventional method for identifying configuration drifts involves manually reviewing each production configuration and comparing it to the recovery or secondary configuration. This approach is very time consuming and expensive and therefore typically done only once or twice a year (if at all), leaving our infrastructure at risk in between audits. It’s easy to see why this approach is really more of a lip service than an effective strategy. In most cases, either something bad will happen or we get lucky before we ever get to test it.
We Can Do Better.
According to our survey results, change management and proactive identification of risks are the top two challenges IT organizations face in ensuring service availability.
While being lucky is always desirable, it cannot be a business strategy. The only way to effectively identify configuration drifts in a scalable manner is through automated detection.
A daily scan of your entire environment can pinpoint risky configuration drifts before something bad happens. There is a growing number of organizations that are already using IT Operations Analytics to turn these findings into actionable insights and proactively eliminate infrastructure downtime risks before they impact critical business services.