The Fall of Facebook and the Importance of Effective Enterprise Risk Management

Risk Management Insta.png

What Happened?

At 11:45am on October 4th, 2021, Facebook experienced a network outage, which affected Facebook services, including Instagram and WhatsApp for nearly 6 hours. What was not apparent until after the systems returned, was that this outage also affected the internal tools and systems needed by Facebook engineers to diagnose and resolve the issue.

How Did this Occur?

Facebooks technical issues began when they conducted a faulty router configuration change, which affected network traffic between their data centers. What this means is that while modifying the configuration of their routers, devices that coordinate communication between data centers, they accidentally changed the “route” or path to somewhere that does not exist. In other terms, the “map” used to link all the data centers was not accurate to what is present, so the data was being sent to a “dead end”.

Through making these changes to their routers, it pushed the new configuration to their other connected systems, leading to a cascading affect that halted all of their services, including their connection to the internet and their internal network of data centers.

This effectively severed all digital communications at Facebook. While many websites and companies have experienced outages in the past, few have had them to this extent.

How Did They Fix It?

Because of the complete loss of digital communications, including their internal emails (Facebook Workplace), Facebook had to send a team to reset their servers at a data center in California in-person. However, with their network down, they had to contend with physical barriers, as their badges did not unlock doors at Facebook facilities, as its badge authentication program was reliant on the internal network.

Some reports indicated that the Facebook team had to use tools to cut through doors and access the services to allow them to reset the network manually, which is why it took such a long time to resolve the problem.

Global Impacts

With so many companies and small businesses reliant on Facebooks services, it resulted in the stoppage of advertising and sales through their platforms, temporarily stopping the income of many people.

The network outage also cut-off communications for WhatsApp users who use the platform as a way to communicate internationally, and in some countries as a form of payment services, isolating many who need the social support during COVID.

How Could It Be Prevented?

Many companies and organizations use Enterprise Risk Management (ERM) programs to prevent or control risks related to functions of the organization. These functions can range from marketing, accounting/financial compliance, quality assurance, and operations management, to name a few.

What Is Enterprise Risk Management (ERM)?

ERM programs strengthen an organizations risk oversight by addressing risks before an even occurs. With the objective of developing a holistic, or all-inclusive, view of the most significant risks, the ERM conducts a top-down view of the business to identify the significant risks to the organization and how they are managed. These risks can be both positive and negative.

Effective ERM processes are important tools within a business and help organizations design strategies to navigate risks while also providing a competitive advantage by reducing the likelihood that risks emerge during important strategic initiatives. If we take the Facebook incident for example, during the outage Facebook had to rely on their competitor, Twitter, to update their users on the status of the outage, affecting their image and reducing public trust in Facebooks reliability.

There are five primary elements of an ERM process, strategy/objective setting, risk identification, risk assessment, risk response, and communication and monitoring. These elements are built around the core objectives of the company, or what is driving the business value.

Given the ERM process is a top-down driven program, responsibility lies with executive management and top leadership to set the tone, as they are ultimately responsible for understanding, managing, and monitoring the most significant enterprise risks.

Final Notes

Global industries and business are changing rapidly, and the complexity and number of risks increases, leading to a demand and expectation for more effective risk oversight. By taking a meticulous look at current risk management approaches, and if they are effective for the organization, many companies are adopting enterprise risk management to accomplish .

References:

  1. https://www.thestar.com/business/2021/10/05/we-are-sorry-facebook-apologizes-and-explains-mondays-global-outage.html?rf

  2. https://engineering.fb.com/2021/10/04/networking-traffic/outage/

  3. https://blog.cloudflare.com/october-2021-facebook-outage/

  4. https://www.businessinsider.com/facebook-explains-cause-of-widespread-outage-in-blog-post-2021-10

  5. https://news.sky.com/story/facebook-outage-what-actually-caused-whatsapp-and-instagram-to-go-down-12426383

  6. https://erm.ncsu.edu/library/article/what-is-enterprise-risk-management

Previous
Previous

Residential Construction Liability (Canada)

Next
Next

5 Tips To Manage Workplace Conflict