In March 2025, London’s Heathrow airport lost power due to a fire at a substation. The entire airport, all five terminals, was closed for a whole day; 1,300 short-haul flights (and 120 long-haul flights) were cancelled, redirected or turned back. The cost has been estimated at “several hundred million pounds.”
The Heathrow closure triggered a nationwide discussion more familiar to a data center conference: Was there sufficient capacity? Why did the alternative power sources (from two substations nearby) not pick up the load? What kind of switching was in place? Were there enough generators? What infrastructure was in front of the meter, and therefore the utility’s responsibility, and what was behind the meter?
A month later, in Spain and Portugal, a huge nationwide power outage raised similar questions (with no clear answers to date). Experts and non-experts alike are scrutinizing the nature of sudden load swings, voltage and frequency sags and surges, operational technology security, cascading failures and the intermittent nature of renewable energy generation.
A day after the Heathrow outage, some of those caught up in the chaos began to question: why did nearby data centers, fed by the same substation, such as those operated by Ark Data Centres and Virtus Data Centres, continue to operate smoothly, while the airport suffered catastrophic disruption? (Data centers were also largely unaffected during the outage in Spain and Portugal.)
How data centers were able to ride out the outage does not require discussion here, but it is less clear why the airport, which is unequivocally part of critical national infrastructure, was not similarly protected. It is not a technical question, but an economic one: despite written warnings from airline representatives, Heathrow’s owners have trusted in grid reliability stability — saving tens of millions of pounds over recent decades. The management is known to have considered, but not invested in, automated switching, which would have enabled the airport to take advantage of spare power capacity at the two other substations. Similarly, it does not have universal generator coverage for critical systems, an overall concurrent maintainability plan, and has neither trained for nor tested against a major grid outage.
Now, under intense scrutiny, airport executives portray these as defensible decisions, given the reliability of the grid and the common strategy of other airports around the world. An Ark Data Centre planning application from December 2024 states that the reliability of the local grid was calculated to be 99.999605% — a persuasive number for accountants, if not for data center managers and engineers.
The diligence and investment at Ark Data Centres (and other data centers) is, of course, normal practice in the data center industry. Tier III levels of resiliency are a default starting point in data center design. But this has not always been, and may not always be, the case. At various times in the past, the need for concurrent maintainability and fault tolerance has been challenged. Early commercial colocation operators (in the 1990s), for example, initially baulked at the (perceived) high cost of Tier III and Tier IV level designs. Early adopters of the Uptime Institute Tier standards at that time were from industries such as financial services, not the emerging companies hosting internet services. Only later, as colocation companies sought serious commercial customers, did they begin to build and operate highly resilient facilities.
From 2010 to 2015, as the industry pursued more efficient and more standard designs, some designers questioned the strategy, suggesting that the Tier classification encouraged the industry to “over design” data centers, and favored resiliency over sustainability and energy efficiency. This charge was quickly countered: operators could always choose the level of resiliency according to their business needs — customers demanded (and were willing to pay for) concurrent maintainability. Additionally, the most resilient data centers were often found to be the most energy efficient.
In the mid-2010s, a new development challenged the preference for high standards of site-level resiliency. Internet giants and some enterprises argued that workloads could be distributed across three or more data centers. Adopting this architecture means that the failure of any single facility would not be an issue. If a failure occurred, the other data centers (oversized to meet emergency needs) could pick up the workload. The more data centers involved, the less oversizing is required.
This distributed resiliency strategy has been successful and widely adopted by many major companies, including Meta, AWS and IBM. One colocation provider famously said they were unconcerned about site resiliency and that their facilities could withstand “fires, floods and fools” by spreading and switching the workloads. But this masked an important fact: operators continued to use facilities with full concurrent maintainability. Only those operators with the most homogenous, easily distributed workloads — and the least sensitive customers — have risked using a Tier II level infrastructure (redundant components for power and cooling, but not concurrent maintainability).
This is, with hindsight, not surprising. While distributed resiliency has been shown to work effectively for many workloads, it is expensive, difficult to deploy and requires diligence. It also requires the deployment of complex traffic management, data duplication and synchronization techniques. As Uptime Institute outage data shows, it is also subject to software and configuration faults. While distributed resiliency has helped the data center industry reduce overall outages, it is mostly built on top of (and not instead of) a network of Tier III level data centers.
What comes next? Today, in 2025, a series of major grid failures, coupled with strong buyer preference, has reduced the likelihood that any mainstream commercial operator will give up concurrent maintainability or forego independence from the grid to save money. Tier III designs are here to stay. (Uptime intelligence’s research suggests that concurrently maintainable infrastructure is favoured even for AI training — even though these energy-intensive, batch workloads are not customer facing and can, in most cases, be restarted with few problems after an outage).
There is, however, pressure for change and the topic of resiliency has resurfaced at some industry events. The issue is less about the difficulties of maintaining liquid cooling (although this question does arise) and more about the technical, contractual and economic relationships between the utilities and data center operators as they struggle to support enormous workloads (see For a grid connection, form a disorderly line).
AI (as a workload) is the catalyst. For owners and operators, securing sufficient utility power is not the only challenge —so is supplying enough on-site generating capacity. The scale of on-site capacity required has increased from single digit to the high tens (or even hundreds) of megawatts in recent years. The complexity, cost, permitting and the operational expertise required — not to mention the supply chain difficulties — is forcing data center owners to re-examine how much coverage they need, and whether it is still economic and practical to use backup generators. For many data center owners, it would be preferable to install gas or hydrogen turbines that operate continuously.
This, however, immediately changes the nature of the relationship with the utility, for whom it is not practical or economic to be merely a back-up provider. At this scale, neither the data center operator nor the utility can afford to operate entirely independently.
This is where traditional questions of resiliency come into play: data center operators have long upheld the principle of controlling their own power and level of redundancy. But for the utility companies, struggling to meet growing demand, the notion that these large generating assets can be isolated and protected (with guaranteed rights to their owners) is no longer a given; it needs to be part of a negotiation.
Does this mean that Tier design principles are about to change, or be applied more liberally? Almost certainly not: the evidence from recent examples — and from strong customer preferences — cautions against this. But it is likely that both grid and utilities will start to explore how the solid principles of resiliency can be applied with more intelligence and real data. They are also likely to place greater emphasis on sharing generating capacity, measuring, categorizing, moving or even curtailing workloads, and on utilizing spare capacity — of which there is usually plenty — more effectively.