Ensure High Availability onIBM PowerHA


You’ve invested a great deal of time and effort in building highly available PowerHA (formerly HACMP) clusters. But will those clusters work when you need them most? Are they guaranteed to fail-over flawlessly no matter what happens?

Let’s face it: it is difficult to keep your cluster configuration perfectly aligned with vendor best-practices, and in sync with changes in the other IT layers that PowerHA interfaces with (such as OS, Storage, Networking and more). Unfortunately, even a small misconfiguration or discrepancy between cluster nodes can lead to unsuccessful fail-overs and painful outages at the worst possible time.

PowerHA Configuration Alignment with Storage and Replication

On a daily basis, AvailabilityGuard verifies that your underlying storage devices are accessible and configured to provide equal levels of availability and service. With AvailabilityGuard, you can be confident that clusters will fail-over successfully, mount storage volumes and volume groups, and start applications – whether running on physical servers, logical partitions (LPAR) or in conjunction with Power VM / VIO technology. When PowerHA SystemMirror is used, AvailabilityGuard ensures that your cluster is set correctly to manage the replication process and make LUNs accessible at the remote site, thus ensuring that multi-site fail-over and your DR plan will work.

Samples issues:

  • LUNS inaccessible to cluster nodes (local or remote nodes), or accessible to unauthorized hosts
  • SCSI-3 reservation best practice violations
  • Incorrect replication settings for IBM DS Series, SVC, V7000, XIV – with either Metro Mirror or Global Mirror, and for non-IBM vendors such as EMC SRDF, HDS TrueCopy (HTC) or UniversalReplicator (HUR), and more.
  • HyperSwap best practices
  • Data misplaced on incorrect storage tier, or on un-shared volumes.
  • Fabric single point of failure or masking/zoning misconfigurations that will fail fail-over

PowerHA Server and Application Level Settings

AvailabilityGuard analyzes the configuration of the different components within the domain of the PowerHA Cluster, including LPARs, operating systems, VIO servers, volume groups, file systems, Oracle and IBM DB2 (UDB) database files and more. AvailabilityGuard verifies that the cluster configuration and the settings of each of these components are aligned and well-orchestrated. Any mismatch may lead to failed switch-overs.

Sample issues:

  • Mismatch between OS mount configuration and cluster mount resource config
  • LVM mirroring, GVLM and GPFS best practices
  • Existence of key directories/files as defined in resources (Oracle listener.ora, Apache httpDir, SYMCLI, …)
  • Resource-specific best practices (volume group, logical volume, file system, application, Service IP labels, Tape resources)
  • Server network configuration – NIC bonding, private and public network connections, etc.
  • Suboptimal VIOS/VIOC settings that create single-points-of -failure, service disruption, partition relocation or cluster failover risks
  • VSCSI and NPIV guidelines for availability and data protection
  • Oracle, Sybase and DB2 configuration analysis

PowerHA Node Alignment

Using an intelligent comparison engine, AvailabilityGuard assists the cluster administrator to identify major differences between cluster nodes. Such inconsistencies often lead to unexpected behavior at and following a cluster fail-over.

Sample issues:

  • Differences in OS version, technology level, installed products, patches, user and group config, kernel parameters, services, network options, configuration files, etc. Difference in FC Adapter settings, Network adapters, time and ntp settings, etc.
  • Difference in multipath config – hdisk number of path, algorithm, queue depth, reserve policy and more.
  • Differences in WebSphere/Weblogic/Tomcat deployments (binaries, domains, Java, etc.)

PowerHA Configuration Vulnerabilities

AvailabilityGuard analyzes the configuration of PowerHA itself, and verifies that it complies with the IBM guidelines and with community-driven best practices. The analysis includes comprehensive investigation of resource groups, resources, network interface, Heartbeat management, and additional components.

Sample issues:

  • Valid resource and resource dependency configuration
  • Network configuration best practices (Cluster communication redundancy, Unicast and Multicast network communication)
  • Valid states for resource, group and systems
  • SystemMirror Configuration best practices, HyperSwap, Storage fencing and more.