[vc_row][vc_column][vc_column_text]Facebook users worldwide were not able to access some of the company’s services today for about an hour. The outage affected Facebook’s 1.3 billion users and the 300 million on Facebook’s photo-sharing website Instagram.
Facebook users have experienced several outages in the past few months, which shows us that a huge IT budget is no guarantee for smooth service availability.
Why it happens?
Even the best IT teams are not well-equipped to manage change. Facebook is able to hire the brightest minds in the IT landscape, but that’s still not an assurance that no critical mistakes are made, as the company’s statement confirmed: “This was not the result of a third-party attack but instead occurred after we introduced a change that affected our configuration systems.”
Since change is the new normal for IT operations, it becomes impossible for humans to keep track of all changes, especially when different IT teams need to maintain huge datacenters. We all know it is preferable to eliminate risks before they impact on your business. But adequately testing the entire infrastructure every time a change is made is practically impossible when done manually.
What can be done?
Preventing those unexpected outages can be done by implementing a proactive approach which pinpoints issues, eliminates risks, and provides unified infrastructure views while directing all necessary information to the right team.
Facebook’s outage last summer lasted 19 minutes and had cost the company close to half a million dollars. This latest outage was much bigger – longer with more services affected. While most of us aren’t Facebook, outages can still be very painful. And they can be even more costly if your customers decide to switch to a competitor that provides more reliable service.