Gap Analysis #4: Point-in-time copies never tested

Gap Analysis #4: Point-in-time copies never tested

by jmmerk on September 19, 2010

by Yaniv Valik
SR DR Specialist, DR Assurance Group

So far in my Gap Analysis series we’ve covered replication inconsistencies, missing networking resources and tampering risk gaps. Today I’m going to take a look at what happens when point-in-time copies are never tested.

Gap: Point-in-time copies never tested

Risk: Data loss and increased time to recover

How does it happen? Point-in-time copies like snapshots and BCVs are the second line of defense to protect against human errors, viruses and outages as well. The DR configuration for applications typically includes:

  • Multiple local point-in-time copies such as EMC TimeFinder, HDS ShadowImage/Snapshot, NetApp FlexClone/Snapshot, or CLARiiON SnapView;
  • Remote synchronous replication such as EMC SRDF, Hitachi TrueCopy, CLARiiON MirrorView, and NetApp SnapMirror; and
  • Local point-in-time copies on the remote site.

In addition, the copies could be mapped to the target DR servers, configured with multi-path software such as EMC PowerPath, Veritas DMP and MPIO, and defined in logical volumes such as Veritas VxVM.

Point-in-time copies can easily become corrupt, without without being discovered, unless the application is fully started and the data integrity is thoroughly tested. There are numerous scenarios that can lead to such a corruption, such as when the replica devices do not all belong to the same consistency group.

What is the impact? The replica is corrupt and unusable. The file system will need to be recreated at the disaster recovery site and data restored from a recent backup, thereby increasing the time to recovery. All data created since the last backup will be lost. Corrupted file systems may still be usable in many cases, and only a close inspection of the content can reveal the fact that the data is meaningless.

Why does the DR test miss this? This gap can be missed if the specific business service is not tested for DR or if the DR test only includes turning on the DR server without actually running the applications.

If this is of interest to you, you check out some other typical gaps on our website: http://www.continuitysoftware.com/commongaps

Comments are closed.