Gap Analysis #6: Configuration Drift between Production and HA

IT Resilience & Downtime Prevention Blog

Gap Analysis #6: Configuration Drift between Production and HA

by jmmerk on February 16, 2011

by Yaniv Valik
SR DR Specialist, DR Assurance Group

Here’s a gap that we frequently see in HA environments.

Gap: Configuration Drifts between HA Cluster Nodes

Risk: Downtime; manual intervention needed to recover

How does it happen?

While there are many ways this can occur, let’s look at one example: the passive node does not have redundancy in the HBA level nor in the DNS configuration. The currently active node is configured with redundancy for these elements. A single HBA/DNS server configuration is a single point of failure. Upon fail-over/switch-over to the currently passive node, the applications running on this cluster will suffer from reduced availability/MTBF and more downtime. In addition, the passive node is configured with significantly less maximum allowed open files, which may lead to application failures. Moreover, the passive node has only 1GB of swap while the active node was configured with additional 4GB. Upon fail-over, the applications may not have sufficient memory to run properly. Lastly, differences in installed products may have various impacts, depending on the product type.

What is the impact?

This will vary depending upon the specific drift, but can include a failure to switch-over/fail-over/switch-over to other node (causing downtime), or reduced performance after fail-over/switch-over which will, at best, create an operations slowdown and at worst leave the node unable to carry the load

Can it happen to me?

This situation occurs frequently in HA environments. The configuration of a host involves so many details that is it very difficult to ensure an HA server is fully synchronized to its production host at all times.

Comments are closed.