Event Recap

RECAP | ROUNDTABLE | Recovering from an Outage

Participants at an Uptime Institute Roundtable discussed how best to prepare to recover from an outage, with three participants sharing details of outages they had experienced. Uptime Institute consultants Scott Good and Nick Archer added more information as both of them had also experienced outages.

All the participants felt their organizations made efforts to prepare for the aftermath of an outage but most still felt unprepared. “What do we need to know that we don’t know?” asked one participant.

As discussed by Good and Archer, the most critical item is communication. In most organizations, according to participants, even well thought out plans do not work well if organizations do not communicate well. Most IT, networking, and facilities teams have a good understanding of their roles in the event of an outage. However, without visibility across the organization, the different teams may struggle to restore continuity.

In one case, reliance on third-party providers can make the situation worse. “They promise a lot but don’t always deliver,” said another participant. He noted that their program team offered good ideas, but they worked in a bubble, isolated from the groups that had to implement the plans.

It turns out that many of the steps necessary to recover from an outage are the same as those needed to prevent failures: Having a deep knowledge of the facility, identifying single points of failure (e.g. single-corded servers), identifying critical loads, developing and updating procedures, scenario testing, and then refreshing the plan.

One participant learned the importance of creating a physical record of hot points in the facility only after experiencing an outage. These critical pieces of equipment are now marked in red.

Another participant noted the importance of evaluating risk. Their Tier II facilities have a 60% power reserve that allows for a smooth restart in the event of a total IT outage. But the power reserve is reduced or eliminated in Tier III or Tier IV facilities because the likelihood of an outage is so much less.

Even in this situation, though, the participant said that pre-planning for an outage decisive in the company’s preparation for an outage.

Request an evaluation to view this report

Apply for a four-week evaluation of Uptime Intelligence; the leading source of research, insight and data-driven analysis focused on digital infrastructure.

Posting comments is not available for Network Guests