Configuration Drifts Between Production and DR

Result: Failed recovery
The signature
Configuration drifts between production and its standby DR host
The impact
In the event of a disaster, fail over to the DR server will not be successful. Manual intervention will be needed to install missing hardware/software, upgrade software and configure kernel parameters correctly. This typically involves extended recovery time and an RTO violation, since the identification of the configuration errors commonly takes days (or even weeks).
Technical details
In this example, the corresponding DR server of a production host does not have enough resources to run the application with reasonable performance. Also, a few products are missing on the DR server while others have lower versions than what is installed on production. In addition, kernel parameters are configured with significantly lower values than in production. Typically, many applications depend on other products installed on the server and on kernel parameters configuration. For example, it is well known that Oracle is sensitive to configuration of semaphores-related kernel parameters.
Can it happen to me?
This is a very common gap found on DR environments. The configuration of a host involves so many details it can be very difficult to have a DR server fully synchronized to its production host at all times. Also, DR tests typically do not involve loading DR with expected production load, thus these configuration issues go undetected.