The VMware File System (VMFS) datastore is a high performance storage solution for virtualized environments. When configured for full redundancy and high performance, each datastore LUN will have multiple I/O paths to the storage array volume.
What Can Go Wrong?
There are several best practices for host to storage mapping and multipathing. Requirements and recommendations may also vary significantly between different scenarios, such as single site, single array vs. multi-site, and metro stretch cluster (vMSC) configurations. Examples include:
- The number of paths and path selection policy should usually be consistent across the cluster node.
- Path configuration should be carefully selected to ensure full redundancy at the HBA, SAN fabric, and array port level.
- When path performance is asymmetric, attention should be paid to ensure that the high performance paths are tagged as ‘preferred’.
If, due to an oversight or miscommunication, an inconsistency was introduced into the environment, the overall resiliency state could change from full redundancy to a single point of failure.
Take a common scenario where the administrator is requested to expand the capacity of an existing datastore (or create a new one) using new storage volumes. If the new volume is configured with only one array port mapping or a single I/O path to the storage array, then all supported virtual machines (often dozens or hundreds), will have an outage risk. The failure of a single switch, port, or cable could impact large portions of the environment. In addition, performance will suffer too.
This misconfiguration will affect both the storage subsystem and the end-users. Examples of impacts include:
- End-users may experience slower response times and degraded performance.
- Storage array performance may be impacted, since the new volume’s I/O is not balanced across multiple array ports. Overload in specific ports could also impact other systems and the VMs that depend on it.
- The IT team will waste time and resources looking for the root cause and fixing performance issues or outages. To add to the confusion, the problem may only appear intermittently and affect different VMs at different times (as a result DRS or vMotion periodically kicking in).