by Yaniv Valik
SR DR Specialist, DR Assurance Group
This gap was recently uncovered at a large insurance company. Since it is one that comes up a lot during routine infrastructure HA/DR monitoring, and inevitability surprises the IT organization, I thought it would be a good one to focus on today.
Gap: Insufficient DR Configuration/Resources
Risk: Extended recovery times, Recovery Time Objective violation
How does it happen? DR and production infrastructures are usually not the same. When building a DR data center, organizations tend to assign fewer resources than their production environments have. If the DR configuration includes significantly fewer resources than production, there is a good chance it will be unable to assume production properly upon failover. It is not unusual, for example, to find a production environment that has multiple paths to storage or software, but the DR environment has to few, or even just a single path. It is also common to find DR sites with misconfigured kernel parameters or insufficient memory or CPU to support full production load.
What is the impact? When the DR site cannot assume production as planned, business operations cannot resume in accordance with the company’s established SLA. In the best case scenario, IT must devote additional resources to execute the unplanned configuration of servers and storage. In the worst case scenario, the company will need to incur additional unplanned capital expenses.
Why does the DR test miss it? Most DR tests do not simulate full production load, so these errors remain undetected. Since DR is mostly offline, this issue never comes to life until an emergency occurs.
If this is of interest to you, you check out some other typical gaps on our website: https://www.continuitysoftware.com/products/availabilityguard/