RecoverGuard™
- Discovery & scanning 
- Gap Detection
- Sample Gaps
- Optimization
- Visualization & Reports

Gap Knowledgebase

Benefits
Request Evaluation
Why Choose RecoverGuard
Product of the year
 
RecoverGuard™ Sample Gap 01225

File system is synchronized to remote site  

Sample Gap 1225

Result: File system is non usable at DR site  

The signature

In this example, a critical file system is stored on three SAN volumes. The data is periodically synchronized, but it so happens that the copies are not of the exact same age.

 

The impact

The existence of such a scenario means that the copy is likely to be corrupt and unusable. If the file system is busy or servers access large files (such as database files which usually meet both criteria) it is extremely likely it would be corrupt.

 

Technical details

File systems have certain built-in self correction mechanisms, targeted at overcoming slight differences resulting from pending writes, unsuccessfully flushed from memory to disk as a result of abrupt shutdown (such as a power-failure, or “blue-screen”). These mechanism are not designed to handle disks which appear to “go back in time” minutes or hours. Replication of disks at various points in time could easily lead to such scenarios which would seem completely “unnatural” to the operating system at the DR site. Journaled file-systems will not help, because they either: (a) journal only files system metadata, and not the data itself; and (b) keep journal data spread on the disks themselves; which is also prone to the same time-difference corruption.

Can it happen to me?

This is one of the top-5 gaps found at even to most well-kept environment. There are dozens or reasons it could happen, and with nearly each one of these, it is nearly impossible to tell that the problem had happened. Because replication itself is successful, there is no indication to the user that something is wrong. Some examples are:

All the disk synchs are correctly managed by one script, but there is another out there that runs afterwards, perhaps on a different host, which has a stray mapping to one of the source disks.

All the disks are added to one array consistency group (or device group) which is used to synch them simultaneously. Note that the definition of the array consistency group is completely separate from the definition of the filesystem and underlying logical volume and volume group. It is easy to associate a disk newly added to the Volume Group on the host side to the wrong array consistency group

There are dozens of permutations and variations of the same theme

One of the disks is copied over a separate cross-array link than the others do. This link might be much busier and cause synch (or mirror, or split, etc. – depending on the vendor terminology) to take more time.

 

 

RecoverGuard Dashboard
RecoverGuard Dashboard
Screenshot

RecoverGuard Topology
RecoverGuard Topology
Screenshot

RecoverGuard Ticket
RecoverGuard Ticket
Screenshot

 
   
"Today’s enterprises are managing complex datacenter environments that consist of a variety of applications, platforms, and storage arrays. These environments experience daily changes in configuration and distribution of resources. These realities challenge IT organizations to ensure consistency of data and applications across production and recovery sites. Replication is a commonly used technology to move data from production to recovery site; any slight change in configuration on the production site that is not replicated at the recovery site can cause the environment to become unrecoverable."

Rhoda Phillips, Research Manager, Storage Software, IDC
 

© Copyright 2007 Continuity Software. All rights reserved. | Privacy

RecoverGuard Evaluation | Disaster Recovery Optimization | Disaster Recovery Risk Assessment | Disaster Recovery Resources

Disaster Recovery Testing & Planning | Disaster Recovery Audit | Site Map