Whether you are a bank, hospital, insurance company, supermarket, entertainment network, or offering any other online services, this isn’t the time to fail your users. They need you! In these days of unpredictability, confusion and even turmoil, we encourage you to be proactive and take control over the things you can – such as the resilience of your IT environment.
Online activities are increasing dramatically. Due to the coronavirus, tens of millions of people are staying home. They’re logging into work, they’re videoconferencing, streaming movies, ordering groceries, shopping, having live math lessons, playing games and more, all from home, and at all hours.
This great jump in online traffic has led to many service interruptions and outages for users worldwide. In Europe people working from home and using Microsoft Teams to communicate saw the service go down. Though it was repaired within two hours, it later went down again after heavy traffic coming from the US caused more disruption. In the UK, some O2, Three, Vodafone and EE customers could not make phone calls. In the US, internet service has been unreliable for many users working from home, Netflix streaming services were down, leaving people at home unable to watch TV shows or films, and globally, gamers using Xbox Live saw outages.
Why can’t sites provide 24/7 service? One major problem at present is that the high demand worldwide was not anticipated. For example, both Netflix and YouTube are lowering the resolution of their streaming quality to SD in Europe in order to “prevent the internet collapsing under the strain of unprecedented usage due to the coronavirus pandemic.” Amazon, as well, is beginning to “reduce streaming bitrates on its Prime Video service.” At this point, enterprises should examine whether their business applications can support the kind of load we’re seeing today since the majority of environments aren’t architected and configured to accommodate the current level of demand.
A significant but avoidable cause of service disruptions and outages is misconfigurations. Under normal circumstances, misconfigurations result from the constant changes being made to IT environments coupled with the sub-optimal communication between the multiple parties responsible for implementing these changes and maintaining the environment (who did what, when, and why).
But now, the unpredicted load creates unpredictable misconfigurations and
forces your systems to work under tough/extreme and never tested before conditions. Here are three examples of such conditions that may be causing disruptions or outages in your environment.
- In AWS or Azure, in extreme load conditions, you can scale up in another region. However, if your resource limits are not defined correctly there might be insufficient resources in the region to which you shift the workloads, potentially causing an outage.
- The cloud providers themselves (AWS or Azure or others) may have an Availability Zone (AZ) failure from time to time. As a result, your workloads must move to another Availability Zone or Region. This transition has a significant impact on your apps and may lead to severe downtime unless configured properly.
- Say you use an internet facing load balancer to optimize availability. When you need to scale up, additional instances will be added accordingly in all subnets configured. If by mistake some of the subnets are private, incoming traffic for these subnets is dropped. This can lead to a severe performance issue or an outage.
As always, but especially now, automation helps tremendously. We’ll use the phrase again: “under normal circumstances,” avoiding misconfigurations in highly dynamic hybrid or cloud-based IT environments is just about impossible to do without a solution that can automatically detect them and allow for them to be repaired before they lead to a disruption or outage. Today, this is truer than ever. The coronavirus is making it necessary for organizations adapt to the realities of increased loads, extreme conditions, changing circumstances and reduced personnel. This is where our automated AvailabilityGuard NXG solution comes in.
In these days of unpredictability, confusion and even turmoil, being proactive and take control over the things you can — such as the reliability of your AWS environment — is a must! For that purpose, and as-long-as the Covid-19 crisis lasts, we are now offering an unlimited free tier for all AWS customers. Using the free tier, you can scan your AWS environment and detect misconfigurations you can fix right now to improve the reliability of your AWS environment.
Contact us today to start protecting your AWS environment reliability using our AvailabilityGuard NXG™ free tier