In the recent report The problem with energy per token, Uptime Intelligence examines how AI inference efficiency varies depending not only on accelerator hardware, but also on how infrastructure is operated over time.
The report shows that energy per token figures can vary significantly depending on throughput during active inference periods, workload characteristics and infrastructure occupancy. Published benchmark figures often reflect highly optimized, high-utilization conditions, whereas real enterprise deployments may experience lower utilization, uneven demand and substantial idle periods. As a result, the same hardware may produce very different levels of energy and carbon efficiency depending on how it is planned and utilized.
Apply for a four-week evaluation of Uptime Intelligence; the leading source of research, insight and data-driven analysis focused on digital infrastructure.
Already have access? Log in here