HA Gap: What can happen when cluster passive nodes have no/partial access to cluster storage volumes

By Yaniv Valik,
SR DR Specialist, DR Assurance Team

One of the most common gaps in clustered environments relates to passive nodes getting out of sync with the currently active node.

A typical scenario occurs when new storage volumes are added to production servers. From time to time, the new storage volumes are only mapped to the currently active node (usually when IT teams are overloaded). Since this configuration error has no effect until cluster failover, this issue goes unnoticed. Then, when a failover does happen, the new storage volumes are not available on the new active node (formerly the passive node). Data cannot be mounted. The administrator then has to identify the missing devices and map them to the new active node. This usually involves downtime…which is exactly what you expect cluster to eliminate.

Scheduled and controlled switchovers are often used to overcome this issue. However, even when switchovers are used on a regular basis, an unexpected error like the one described here still results in downtime. Automated configuration monitoring technology (like RecoverGuard ) can reduce the number of these critical errors to zero.  For instance, RecoverGuard will open a ticket and alert you when the passive nodes of a cluster do not have access to the same SAN storage volumes accessed by the currently active node, so you can fix it before a failover event occurs.

We use cookies to enable website functionality, understand the performance of our site, provide social media features, and serve more relevant content to you.
We may also place cookies on our and our partners’ behalf to help us deliver more targeted ads and assess the performance of these campaigns. You may review our
Privacy Policy I Agree