Among other, BCP personnel bear the responsibility for recoverability in case of disaster. The BCP manager must verify that at any given time, data can be recovered and operations can be resumed successfully according to the policies (RPO, RTO) set by the organization. This is all fine and dandy in theory, but take a moment to think about it – How exactly will a BCP manager determine recoverability status at any given time?
At best, a DR test is performed every quarter. Suppose you were responsible for the IT recoverability of a large financial institution and that DR exercises are performed on January 1st, April 1st and so on. What would you tell to the CIO should he ask you on February 15 if IT is recoverable? Would you be able to answer confidently “Yes”? The honest and correct answer would be “Sir, I do not know. We fixed the glitches found on the Jan 1 DR test so I guess IT was recoverable back then…. But now – I cannot say for sure. Probably not.”. It gets worse, right? Sure it does. Further deliberation will expose other weak spots in DR testing that we all experience – the DR test included only a small portion of IT….not all critical systems… production wasn’t really shutdown/cut-off during the test… didn’t really simulate end-users (or load scenarios)…. I can go on and on. And so, the question remains – How will BCP evaluate readiness for DR at all times?
Let’s compare notes with other datacenter departments. How does Network Security know that the network is secured? How does a system administrator know that a server is malfunctioning? The answer is simple: They have visibility into their domain. In other words, they have the tools that allow them to explore their area of responsibility, get an up-to-date detailed status and automatic notification when something goes wrong. BCP, like any other IT department, must have the right tools for the job. Yet unlike System administrators, Database admins (etc.), BCP needs a management solution that provides visibility into all IT layers and not just to servers or just to database configurations and so on. Furthermore, DRM solutions must be capable of analyzing the dependencies between the different layers and find recovery vulnerabilities.
Can you imagine any Organization with 7+ figures IT budget not purchasing a server performance solution such as HP performance manager or IBM Tivoli monitoring? Or network monitoring and event management solutions such as CA Spectrum/eHealth or HP NNM? Of course not, because it’s clear that datacenter monitoring requires automation (A too complex task to be performed continuously and accurately by human beings) and that without automated monitoring, suboptimal operation and downtime are unavoidable. BCP/Recovery management is no different. Without a DRM solution, the BCP personnel are “blind” and are un-aware of datacenter status in terms of readiness for recovery. They must put their trust and faith in the hands IT teams whose first priority is production. They might be kind enough to share some technical details with the BCP team… but a working datacenter is not based on mere kindness and “favors” but on intelligent processes which lead to an efficient, goal-oriented teamwork.
The good news is that high-end DRM solutions have emerged in the last few years, giving BCP personnel just the tools they were missing. Products such as RecoverGuard by Continuity Software and Disaster Recovery Advisor by Symantec provide BCP staff with a real-time business-oriented status of readiness for disaster (including both HA and DR). These analytics tools automatically identify hidden HA/DR risks and let the user know about them as soon as they happen. They also let the users explore the different IT layers, understand dependencies between production databases, servers, storage, remote storage, DR server and so on. If you are thinking about deploying a DRM solution, note that Continuity Software offers a risk free 48-Hour RecoverGuard pilot.
To guarantee successful recovery 365-days a year, BCP/recovery personnel must have solutions that provide visibility all across the datacenter, and that automatically and continuously perform datacenter configuration analysis to ensure no recovery gaps and vulnerabilities exist. Such DRM solutions have grown in the past few years to be an integral part of every large size IT organization as it became apparent that with such solutions, significant downtime and loss of data can be avoided.
I for one believe it a major milestone in the everlasting struggle to control HA/DR.