GPU utilization is a confusing metric

Max Smolaks

1 May 2025

6 min read

The specialized IT equipment required to perform AI training and inference is relatively new. These devices are expensive and need to be used effectively, especially GPUs for AI. Yet research literature, disclosures by AI cluster operators and model benchmarks suggest that — similarly to other types of IT infrastructure — GPU resources are often wasted. Many AI teams are unaware of their actual GPU utilization, often assuming higher levels than those achieved in practice.

On average, GPU servers engaged in training are only operational 80% of the time. When these servers are running, even well-optimized models only reach 35% to 45% of compute performance that the silicon can deliver. The numbers are likely worse for inference, where the workload size is dynamic and less predictable, fluctuating with the number and complexity of end-user requests.

Request an evaluation to view this report

Apply for a four-week evaluation of Uptime Intelligence; the leading source of research, insight and data-driven analysis focused on digital infrastructure.

Request Evaluation

Related Research

See the full list

Intelligence Update

GPU utilization is a confusing metric

Request an evaluation to view this report

Related Research

Recommended Topics

SITE

FEATURED TOPICS

UPTIME INTELLIGENCE

GLOBAL HEADQUARTERS