Over the last few years container technology has gained greatly in popularity and been adopted by large enterprises, SMEs and startups alike. Due to its many benefits, for newly developed applications, container technology is the obvious choice. It brings many advantages to enterprise IT such as platform independence, efficient resource usage, simple deployment, rapid scaling and more. As more and more organizations adopt and rely on container technology for critical applications, ensuring their stability, availability and resiliency is a must. and how to set them up in an optimal and resilient manner. If containers are not deployed and configured according to guidelines and best practices, the result may very well be unplanned service disruption and even outages.
Unknown dependencies: One of the key challenges in avoiding single points of failure
Containers represent an additional layer of abstraction of the infrastructure. With more abstraction, agility and dynamic workload changes, it becomes a greater challenge to identify dependencies between an application and virtual and physical infrastructure. Without understanding the dependencies, an IT organization may very well deploy an application in a way that hides single points of failure. For example, IT may believe an application is being deployed across multiple failure domains when, in fact, it is running in just a single failure domain. A set of containers running a given application may be configured in such way that they depend on a single region, single host, single blade, single enclosure, single switch or other component – either because it was initially configured that way or because “things shifted during the flight.” The implication of such hidden misconfigurations is that the application will suffer downtime when a single infrastructure component or service fails. Clearly, this is unacceptable to enterprise organizations. One of the ways to avoid such single points of failure is to set up pod anti-affinity rules; care must be taken to correctly set up rules to ensure that containers are never placed in such way that results in an application outage risk.
Container Orchestration solutions must also be highly available
Another example is the dynamic management of containers, or what is known as container orchestration. Container orchestration solutions such as Kubernetes are critical to the resilience of the applications the containers host. Without Kubernetes, containers become stranded, unmanaged workloads – their health is not being monitored, failed containers are not restarted or relocated, not to mention that configuration management and scaling up functionality are unavailable. Thus, it’s essential that Kubernetes, or any other container orchestration system used, is itself resilient and highly available. Kubernetes components (such as etcd) should run as a multi-node cluster in production to avoid single points of failure and to be able to sustain a failure without losing data.
Containers have significant advantages, but the use of containers does not guarantee availability and resilience. Considering how dynamic IT environments have become, it is of utmost importance to continuously assess the quality of the configuration of the container environment, including dependent IT components. Considering the number of industry practices and how frequently they are updated, only ongoing and automated assessment is a viable solution.