In an Uptime Intelligence report published earlier this year utilization was shown to be the key parameter in determining whether public cloud or dedicated infrastructure was more cost-effective for a given AI workload (see Neoclouds: a cost-effective AI infrastructure alternative). With a higher level of utilization (as a proportion of total available capacity), dedicated infrastructure can have lower unit costs than public cloud. With a lower utilization level, public cloud becomes more affordable. The breakeven threshold — the minimum average utilization where dedicated becomes cheaper than cloud — was found to be 33%.
In this Uptime Intelligence report, neoclouds (a new category of cloud providers specializing in AI infrastructure as a service) were typically found to be charging less for GPUs than hyperscalers. In July, however, AWS announced significant price cuts of up to 44%, bringing its GPU-backed instance prices closer to those of neoclouds (Table 1).
Table 1 AWS GPU price cuts
Why cut prices at all? AWS’s explanation is that it now has the economies of scale to reduce costs. Greater demand for services by customers drives the purchase of more servers by AWS, which enables it to negotiate lower prices per server from suppliers such as Nvidia. These savings are passed on to their customers through lower cloud service prices.
However, Uptime Intelligence has identified additional factors influencing these price cuts:
These changing market conditions affect other types of data center operators. Those using, or planning to use, dedicated infrastructure in colocation facilities and private facilities may see capital cost reductions due to the improved availability of GPUs and the lower cost of new hardware — making older hardware less expensive. However, economies of scale and innovative automation give hyperscaler cloud providers a cost advantage that most enterprises are unable to obtain.
AWS price cuts reduce the average price for a Nvidia H100 instance across hyperscalers and neoclouds from $66 to $59 per month. Now, operators need to ensure utilization beats 38% to undercut public cloud.
These reductions make it harder for dedicated infrastructure to beat cloud on price. An operating expenses (OpEx) purchasing model means cloud users typically benefit from price cuts immediately. In a capital purchase, the amount invested remains the same.
Variable cloud prices make it challenging to build a robust comparison between AI hosting models. If dedicated infrastructure has been purchased on the assumption of stable cloud pricing, the return on investment may be less favorable if cloud costs decrease. A dedicated server purchased a few months ago with an expected lifetime utilization of 35% would have been above the 33% threshold, making it more cost-effective than a public cloud instance. Today, at a breakeven point of 38%, the public cloud is the more cost-effective option.
Is there a likelihood that cloud GPU prices will rise? It would be unusual; providers would lose significant trust, as well as sales, if they were to raise prices. A customer may consider it as price gouging if, after committing to a cloud provider for their IT estate, the provider was to increase their monthly prices without offering an easy option to migrate elsewhere. A dedicated GPU server purchase gives stability in costs and protects against unlikely — yet possible — cloud price rises.
Other cloud providers, notably Google Cloud and Microsoft Azure, are likely to follow suit by cutting their prices in response. No provider wants to be seen as more expensive and risk losing new business. These price cuts may shift the breakeven threshold further, making it more difficult for dedicated GPU infrastructure to win on price.