| |
|
RecoverGuard™ Sample Gap
01225
File system is synchronized to remote site

Result: File system is non usable at DR site
The signature
In this example, a critical file system is stored on three SAN volumes. The data is periodically synchronized, but it so happens that the copies are not of the exact same age.
The impact
The existence of such a scenario means that the copy is likely to be corrupt and unusable. If the file system is busy or servers access large files (such as database files which usually meet both criteria) it is extremely likely it would be corrupt.
Technical details
File systems have certain built-in self correction mechanisms, targeted at overcoming slight differences resulting from pending writes, unsuccessfully flushed from memory to disk as a result of abrupt shutdown (such as a power-failure, or “blue-screen”). These mechanism are not designed to handle disks which appear to “go back in time” minutes or hours. Replication of disks at various points in time could easily lead to such scenarios which would seem completely “unnatural” to the operating system at the DR site. Journaled file-systems will not help, because they either: (a) journal only files system metadata, and not the data itself; and (b) keep journal data spread on the disks themselves; which is also prone to the same time-difference corruption.
Can it happen to me?
This is one of the top-5 gaps found at even to most well-kept environment. There are dozens or reasons it could happen, and with nearly each one of these, it is nearly impossible to tell that the problem had happened. Because replication itself is successful, there is no indication to the user that something is wrong. Some examples are:
All the disk synchs are correctly managed by one script, but there is another out there that runs afterwards, perhaps on a different host, which has a stray mapping to one of the source disks.
All the disks are added to one array consistency group (or device group) which is used to synch them simultaneously. Note that the definition of the array consistency group is completely separate from the definition of the filesystem and underlying logical volume and volume group. It is easy to associate a disk newly added to the Volume Group on the host side to the wrong array consistency group
There are dozens of permutations and variations of the same theme
One of the disks is copied over a separate cross-array link than the others do. This link might be much busier and cause synch (or mirror, or split, etc. – depending on the vendor terminology) to take more time.
|
 |
|