Why DR testing doesn’t work
by Doron Pinhas
VP, Field Operations
When it comes to disaster recovery and business continuity, there’s an elephant in the room that a lot of people don’t really like to talk about: DR testing. Every DR strategy I’ve come across uses DR testing to validate effectiveness. But the truth is a full, by-the-book DR test is really, really hard to do correctly. It takes a lot of advance planning, a lot of time to execute, and usually a lot of money. Almost every company will take shortcuts like:
- Testing just a few key portions of the infrastructure, rather than testing the full DR environment. Companies may, for example, test very few business services and postpone the rest to a future test.
- Keeping storage/database/application management servers and/or domain/name servers or file servers online while performing the test
- Conducting orderly system shutdowns to protect production systems, rather than simulating the abrupt cessation of operations that would occur in a disaster.
- Testing failover servers but not applications.
- Testing applications but not simulating the actual load the application must bear following a full site recovery.
- Neglecting to test dependencies, data inconsistencies and mapping errors that may exist between SAN devices and hosts, or any of the other errors that can cause a recovery to fail.
All of these shortcuts may help them get through the test, but the results are incomplete, and definitely DO NOT indicate how well the DR systems in place will react in the event of a true disaster.
This is a problem that has plagued me my entire career, and it is why I became such a believer in automated DR monitoring technology that I helped found Continuity Software. DR management software is able to penetrate deeper into the environment to ensure the infrastructure status is always aligned with the protection goals. For instance, our RecoverGuard software analyzes dependencies between IT assets and the business services they support because it maintains the most comprehensive and constantly updated documentation of IT resources and dependencies in the production and DR sites.
DR management software can perform tasks that are simply too cumbersome or complex for humans to perform, such as assessing the accuracy of intricate mappings, making sure all replicas exist and are consistent, and identifying RTO/RPO violations. I’ll tell you, I’m a believer!