4 Oracle DataGuard Recovery Risks

4 Oracle DataGuard Recovery Risks

by Yaniv Valik on July 19, 2012

Your company decided to replicate production Oracle databases to a remote DR site using Data Guard. The database administration crew set it up and made several tests to ensure that it works properly.  How do you know that it will also work properly tomorrow or the day after? Many things can go wrong, rendering the standby Oracle database not fit for recovery. Some risks would not be specifically related to Oracle configuration. Others may be very subtle and difficult to identify manually.  The standby database may fail to start when you need it the most. or maybe it will start, but some of the data will be missing (ouch!). Or maybe it’ll “just” perform very badly after the fail-over and cause service disruption. I’ve selected 4 examples of common Oracle Data Guard vulnerabilities’ to share with you. Obviously there are thousands of risks that may affect the availability and recoverability of an Oracle database in Data Guard mode. So, here there are:

1. Standby database not synchronized with its primary Oracle database

Like the vast majority of the companies which chose Data Guard, your company probably decided to set it up in the default “MAX PERFORMANCE” mode, which basically puts the performance of the source database as 1st priority and standby synchronization only as a 2nd priority. Redo logs are written a-synchronously to the standby database and if there are delays, then standby database falls behind. It’s likely to assume that on rush hours, the gap between the source and standby database would be the highest. If a failure would occur during this time, significant amount of data could be lost. BCP Manager – how would you know whether DataGuard synchronization complies with your RPO goal? You don’t!

Of course, there are many others reasons for the Standby Oracle to fall behind the source database such as network issues causing heartbeat failures is one examples, storage configuration on the standby servers and more.

2. “Force Logging” being disabled for a primary Oracle database

Enabling “Force Logging” is one of many Oracle best practices for Data Guard environments.

Few words about Force Logging – Oracle provides a means of forcing the writing of redo records for changes against the database, even where NOLOGGING has been specified in DDL statements. Any un-logged operations would invalidate the standby database and would require substantial DBA intervention in order to manually propagate un-logged operations.

3. The archiver of an Oracle instance is stopped

On the primary database, Data Guard uses an archiver process to collect transaction redo data and transmit it to standby destinations. The archiver is a key process in a Data Guard environment. Without it, synchronization will not take place. Every now and then DBAs do some maintenance work, stop the archiver process but forgot to bring it back online. It’s only human to make such mistakes from time to time and there’s nothing you can do to avoid them.

4. Critical Primary-Standy OS Configuration Differences

Difference in the configuration of key kernel parameters (open files, semaphores, shared memory, threads) would result in either failure to start the instance on the Standby server in case of failure or in the instance providing an “unexplained” poor service level (stability, performance).

Thousands of things can go wrong every day without you even knowing about it. Testing some of the systems once in a while is hardly enough. The only viable way is to automate DR verification with a proper tool. A tool that will perform a daily read-only scan of your infrastructure and guarantee that availability and protection levels are high, and that no new risks have emerged. A tool that can handle all the above examples and much more. Come visit us at www.continuitysoftware.com and check out our risk free 48-hours pilot for RecoverGuard.

Yaniv Valik
Yaniv Valik
VP Product Management & Customer Success at Continuity Software

Comments are closed.