The specialized IT equipment required to perform AI training and inference is relatively new. These devices are expensive and need to be used effectively, especially GPUs for AI. Yet research literature, disclosures by AI cluster operators and model benchmarks suggest that — similarly to other types of IT infrastructure — GPU resources are often wasted. Many AI teams are unaware of their actual GPU utilization, often assuming higher levels than those achieved in practice.
On average, GPU servers engaged in training are only operational 80% of the time. When these servers are running, even well-optimized models only reach 35% to 45% of compute performance that the silicon can deliver. The numbers are likely worse for inference, where the workload size is dynamic and less predictable, fluctuating with the number and complexity of end-user requests.
Apply for a four-week evaluation of Uptime Intelligence; the leading source of research, insight and data-driven analysis focused on digital infrastructure.
Already have access? Log in here