UII UPDATE 359 | APRIL 2024

Intelligence Update

Digital twins: reshaping AI infrastructure planning

Digital twin software: Part 1

Uptime Intelligence has been observing digital twin (DT) capabilities in data center management and control software (DCM-C) for some time. While the DT concept is not new, recent advances in precision physics-based models, interactive simulations, and AI/machine learning (ML) have made DTs a more viable consideration for many operators.

Meanwhile, adapting or building new facilities for high density IT and AI infrastructure all add to management complexities. Many operators lack the experience to design and commission high density facilities and are uncertain about future infrastructure requirements (see The DeepSeek paradox: more efficiency, more infrastructure?).

For those planning new AI data centers or upgrading existing facilities, DTs offer the potential to design infrastructures in a virtualized sandbox environment. Performing tests, generating insights and making informed decisions can all occur before committing to significant capital infrastructure investment. Since December 2024, examples include:

  • Siemens Building X DT software is being used by Compass Datacenters in the US for its AI and cloud hyperscale campus build out. The DT is used in the design, planning, testing and assembly of 1,500 medium-voltage custom skids, over five-years. These prefabricated units include the configuration and containment of electrical systems, such as switchgear and transformers. Skids may help to mitigate the risk in AI campus build-outs because they are modular and moveable, allowing operators to plan and adapt their infrastructure configurations and requirements, dependent on customer demand and other unforeseen changes, such as power availability.
  • Nvidia, Schneider Electric, Vertiv, Cadence and ETAP partnered on an AI factory DT initiative, using the Nvidia Omniverse DT platform. The aim is to predict how changes in AI workloads affect power and cooling, by testing the risks of grid failures, cooling leaks and power spikes. Future simulations will consider AI-enabled cooling optimization (Phaidra) and SCADA equipment controls (Vertech).

This is the first of two reports on DTs and will identify key attributes of DTs for data center applications and outline the opportunities and challenges for operators. The second report will explore DT software product maturity.

What is a digital twin?

A DT is a software system that utilizes component libraries and precision sensor data to create digital replicas of physical assets in the data center. DTs employ simulations and models to test operating scenarios, make predictions and provide recommendations based on the virtual environment. They can also be used to discover hidden faults – any discrepancies between the virtual model and the real-world facility would indicate a problem with equipment, sensors or data.

The terms visualizations and simulations are often used interchangeably. However, while both rely on sensor data, they differ on objectives and capabilities.

Visualizations involve digitizing, modeling and configuring an asset or a collection of assets for monitoring, identification and capacity planning purposes. Data center infrastructure management (DCIM) software products often visualize assets, such as facilities equipment, IT servers, racks and network ports.

Simulations often use asset visualizations and their source data, and then add other application and environmental data inputs to model the impact of change under different operating conditions.

Key attributes of a DT include:

  • Rich contextual data. DTs rely on fine-grained asset data, including detailed information on parts, configurations, parent/child dependencies, integrations and connections. Some include supply chain, environmental and sustainability data, energy use and performance ratios. Simulations and predictions also generate new data for further analysis.
  • Open and interoperable. DT software is typically vendor agnostic, using APIs and open standards for full interoperability across third-party hardware, systems and software.
  • User-friendly and configurable simulations. DT user interfaces prioritize ease of use so that assets can be seamlessly moved in and out of the simulated environment for testing and analysis. System changes and updates may be automatically recorded in the DT for auditability and data quality purposes.
  • Scientific and algorithmic. Many DT models rely on applied physics, mathematics and data science principles to model how the equipment operates in different conditions. For example, computational fluid dynamics is a well-established physics-based approach to simulation, used to model airflows in the data center and is often used in DTs. ML algorithms are sometimes applied to model different scenarios and refine the outputs.
  • Real-time intelligence. Integrating sensor data into the DT provides the model with live operational and environmental information, such as temperatures, pressures and energy consumption. This can help engineers conduct system health diagnostics, detect early performance deterioration, and perform preventive maintenance. Operational data can also be used to perform “what if” scenario analyses.

Avoiding unnecessary risks and costs

Virtualizing a physical environment for testing purposes is a key benefit of a DT. Predicting the impact of changes away from a live operational environment helps mitigate risks associated with equipment retrofits and new facility designs. DTs targeting the following outcomes will likely be the most effective.

Availability and resiliency

  • Simulating asset and system performance under different conditions can help to identify critical stress thresholds and support proactive and predictive maintenance.
  • Simulating AI infrastructure's power and cooling capacity requirements could help to identify design weaknesses and vulnerabilities for further investigation.

Economics

  • Simulating cooling equipment energy consumption and performance under different operating conditions can help identify the most efficient configuration settings.
  • Simulating server CPU utilization and IT power consumption can identify under- and over-utilized servers, stranded power and cooling capacity, and help inform decommissioning decisions.
  • Simulating the performance of AI infrastructures and workloads under different operating conditions could help to identify areas for efficiency and optimization.

Challenges and considerations

Despite DT’s potential in designing and managing complex facilities, such as those for AI and high-density IT, many enterprises and operators still need to modernize and improve their systems and processes. Addressing issues around data quality, provisioning, interoperability and cybersecurity should be immediate priorities.

Data quality

Operators considering DTs will likely need to invest significantly in data quality improvements, processes and system interoperability. Trust in the data (and, by extension, the DT outputs) is critical to gaining corporate buy-in and adoption.

Cloud or on-premises trade-offs

As with other types of data center management software, operators will likely be averse to sharing operational data with cloud-based DTs due to perceived security and confidentiality issues. However, cloud-based DT platforms, such as Microsoft’s Azure Digital Twins or Nvidia Omniverse will likely be of significant interest to those designing new AI and hyperscale cloud data center facilities. Data derived from these DTs will also benefit future applications.

Open connectivity risks

DT value creation will depend on access to rich datasets inside and outside the data center. Restricting connectivity to internal IT and OT systems may alleviate specific external network and software security concerns. However, any restrictions will limit the value of DTs relying on shared data.

Cybersecurity

Open data and simplified integrations present security risks if data center software and systems are unpatched, unsupported or lack adequate authentication and access control. Data center cybersecurity remains a challenge for many operators. Since DTs rely on sharing data, a lack of cybersecurity credentials could leave DTs vulnerable to exploits seeking to exfiltrate sensitive operational data.

The Uptime Intelligence View

DT simulations hold promise to transform outdated operational design and planning practices that often involve manual work and bad quality data. Moreover, the ability to accurately simulate changes and identify risks and opportunities in areas such as AI and high-density IT infrastructure could have significant benefits for operators.

DTs depend on turning high quality data inputs, into accurate simulations and predictions. The reliance on sensor data and often outdated processes, however, means that operators need to prioritize data quality, system interoperability and cybersecurity to build and maintain trust.

 

Other related reports published by Uptime Institute include:
Pulling IT power data with software
DCIM past and present: what’s changed? 
Using optimization software for cooling and capacity gains
Data center management and control software: an overview

About the Author

John O'Brien

John O'Brien

John is Uptime Institute’s Senior Research Analyst for Cloud and Software Automation. As a technology industry analyst for over two decades, John has been analyzing the impact of cloud migration, modernization and optimization for the past decade. John covers hybrid and multi-cloud infrastructure, sustainability, and emerging AIOps, DataOps and FinOps practices.

Posting comments is not available for Network Guests