Cloud Giants Transition to PTU Model Amid Surge in AI Token Costs

icon MarsBit
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
New token listings attract attention as cloud giants transition to PTU models amid surging AI token costs. Token launch news underscores Microsoft, Amazon, and Google’s $5 trillion spending plans for 2026. Rising token usage—up 1,000-fold in two years—drives the shift from token pricing to PTU for cost control. Microsoft leverages ecosystem strength, AWS emphasizes efficiency, and Google targets performance. This transition will reshape AI value chains, impacting chipmakers and app developers.

In the spring of 2026, the earnings season for North America's cloud computing giants turned into a collective day of suffering.

Microsoft’s Intelligent Cloud division reported quarterly revenue exceeding $50 billion for the first time, with Azure growing 39% year-over-year; yet its stock plunged nearly 10% after the earnings report. Amazon’s AWS posted its fastest revenue growth in 13 quarters, but its stock dropped 11% the following day. Google Cloud’s revenue surged 48%, yet its stock still turned from gains to losses in after-hours trading.

There’s only one reason: money. Or rather, out-of-control AI bills.

Reviewing the financial reports of these giants reveals that Microsoft's quarterly capital expenditure reached $37.5 billion, a 66% year-over-year increase; although no full-year guidance was provided, analysts suggest that, if the current quarterly trend continues, annual spending could exceed $100 billion. Amazon has announced plans to spend $200 billion by 2026, while Google intends to invest between $175 billion and $185 billion—nearly double its 2025 spending.

The combined spending of the three giants exceeds $500 billion, equivalent to Norway’s entire 2024 GDP.

What are capital markets anxious about? Is it that cloud growth isn’t strong enough? Clearly not. The more large customers use cloud services, the more likely cloud providers’ bills are to “blow up.” A quiet battle over “how to charge” is unfolding in Silicon Valley—and the outcome will reshape value distribution across the entire AI industry.

Does the 01 Token model begin penalizing deep users?

An undeniable fact is that the token-based billing model has become one of the biggest drivers behind the widespread adoption of AI.

At the beginning of 2024, China’s daily average token call volume was 100 billion; by the end of 2025, it had surged to 100 trillion; in March this year, it exceeded 140 trillion, representing more than a thousandfold growth in two years.

At the same time, as "crayfish" went viral, AI shifted from being a "toy" to a "production tool," and the drawbacks of the Token model began to emerge.

Taking an AI agent as an example, a traditional chatbot consumes hundreds to thousands of tokens to answer a single question. However, an AI agent capable of independently completing tasks requires multiple rounds of reasoning, repeated tool usage, and processing large amounts of context. Industry experts estimate that an agent’s token consumption can increase by dozens of times, and for complex tasks, it can reach hundreds or even thousands of times that of a regular conversation.

In March 2026, OpenAI shut down its video generation tool Sora, one reason being that it was not financially sustainable. According to Beijing News, citing analysis from SemiAnalysis, Sora’s daily operating cost approached $15 million, resulting in an annual cost of up to $5.4 billion.

The head of the OpenAI project bluntly stated: "The current economic model is completely unsustainable." Video generation consumes far more computing power than text or image generation; the GPU resources required for a single video generation could allow ChatGPT to answer dozens of questions, severely straining core business resources.

Under the token model, the more AI you use, the more your bill spirals out of control. NVIDIA CEO Jensen Huang even said that in the future, every NVIDIA engineer will have an annual token budget—and it would seem strange if a high-salary engineer didn’t use up most of their tokens in a year.

When an industry reaches the point where "the more usage, the more fear," it indicates a fundamental problem with its pricing model.

The more you use, the more I earn—sounds great. But for large clients facing wildly fluctuating AI bills, CFOs are unwilling to approve scaled budgets. The token model is punishing those it should most strive to retain: the top customers with the highest usage and deepest integration. This runs counter to cloud providers’ long-term interest in growing the overall market.

In this context, North American cloud providers have introduced a new offering called PTU: Pre-Provisioned Throughput Unit.

In simple terms, customers pre-purchase a certain amount of computing power capacity and pay a fixed monthly, quarterly, or annual fee, regardless of actual Token consumption. The Token model operates on a “pay-as-you-go” basis, while PTU is like an “all-you-can-eat monthly buffet.” Customers gain cost predictability, and cloud providers secure customer retention.

The underlying strategic logic has completely changed.

In token mode, the two parties are engaged in a zero-sum game: when customers save money, the cloud provider earns less; when customers spend more, the cloud provider earns more. However, large customers, fearing cost overruns, reduce their usage, causing the cloud provider’s revenue growth to fall short of expectations.

Under the PTU model, the game becomes a positive-sum one: once customers lock in their budgets, they are more willing to increase their AI usage, leading to more sustainable revenue growth for cloud providers. Essentially, risk is shifted from the customer side to the cloud provider side, in exchange for deeper customer engagement.

Guoxin Securities drew an analogy between the evolution of mobile internet pricing in China.

In the 2G era, data was billed per KB at 0.01 yuan/KB, making users feel like they were bleeding money with every byte used. In the 3G era, pricing shifted to MB, with plans like 150 yuan for 3 GB, allowing users to use data more freely. In the 4G era, initiatives to “increase speed and reduce costs” spurred the explosive growth of unlimited data plans—such as Tencent’s “Da Wang Card” in 2016, offering unlimited data for 19 yuan per month with zero-rated access to Tencent apps, completely shifting users from “buying data” to “buying services.” In the 5G era, billing has further evolved into “tiered by speed,” where data volume is no longer the core pricing factor.

Each shift in billing models represents a reallocation of power within the industry. By abandoning the highly profitable "per KB" charging model, carriers enabled users' average monthly data usage to surge from 30 MB to over 10 GB, expanding the total market size by hundreds of times.

Today’s choices among cloud providers mirror those of telecom operators back then: they are willing to sacrifice short-term gross margins to secure the certainty of long-term contracts. According to a report by Guoxin Securities, the transition to PTU will shift the cloud business’s gross margin structure from “highly volatile” to “highly resilient”—implying short-term pressure but a healthier, more stable outlook in the long term.

02 The Open Strategy Differentiating the Three Major Players

Microsoft, AWS, and Google all appear to be promoting PTU, but their underlying strategies are vastly different.

Microsoft relies on ecosystem bundling. Its tools are the vast ecosystem composed of Windows, Office 365, and GitHub. It has launched the Azure AI Commitment Program, encouraging enterprise customers to sign 1- to 3-year consumption commitment contracts. This quarter, Microsoft’s commercial deferred revenue surged to $625 billion, more than doubling year-over-year, with 45% coming from new agreements with OpenAI worth $250 billion.

Microsoft is playing its cards carefully—the highest level of pricing power is making it impossible for customers to separately account for AI costs. When AI becomes just a button in Word, the budget naturally gets rolled into software subscription fees. However, over-reliance on a single customer has raised market concerns: should OpenAI face financial pressure, the ripple effects would directly hit Microsoft.

AWS relies on its cost advantages. Its confidence stems from its proprietary Trainium and Inferentia chips, as well as the world’s largest cloud infrastructure scale. It actively promotes the “AI/ML Savings Plan,” offering customers significantly lower prices compared to on-demand pricing.

Amazon CEO Andy Jassy made a very strong statement during the earnings call: "Achieving a 24% year-over-year growth on an annualized revenue base of $142 billion is entirely different from competitors achieving higher percentage growth on significantly smaller bases."

AWS builds its moat through unparalleled supply chain efficiency and isn't afraid of price wars, as its unit computing cost is already the lowest in the industry. During earnings calls, executives have repeatedly emphasized that "new computing capacity can be monetized quickly," implying that it is betting on economies of scale to ultimately overwhelm all competitors.

Google, on the other hand, relies on performance premiums. With the deepest expertise in AI technology, Google has advanced its proprietary TPU to its seventh generation and boasts 750 million monthly active users for its Gemini model. In the fourth quarter, Google Cloud revenue grew by 48%, significantly outpacing its competitors.

Google has extended its "committed use discounts" to its AI platform to attract customers with extreme performance requirements. It is pursuing a technology-luxury strategy, not seeking the largest number of customers, but ensuring that the most profitable high-end clients cannot do without it.

The partnership with Apple is a crucial step: Google has become Apple’s preferred cloud provider, and the two companies are collaborating to develop foundational models, ensuring that Google’s AI technology reaches global users through Apple devices.

Three strategies represent three distinct economic moats: Microsoft relies on switching costs, AWS on economies of scale, and Google on technological leadership. To determine which company will benefit most from the PTU transition, the key is assessing whether its type of moat can remain effective throughout the long-term contract lock-in period.

However, the impact of PTU's transformation will not stop between cloud providers and large customers; it will ripple upstream to chip manufacturers and downstream to application providers.

Upstream chip manufacturers benefit first. Under the token model, cloud providers’ compute procurement is “pulsatile”—placing emergency orders during traffic spikes and leaving resources idle during lows. Under the PTU model, long-term contracts enable cloud providers to place smoother, more predictable orders upstream.

Microsoft plans to increase its AI computing power by more than 80% within 2026 and double its data center capacity over the next two years. Chipmakers such as NVIDIA will have more breathing room in their production planning, and supply chain efficiency will improve significantly.

Downstream AI application providers are accelerating consolidation; under the PTU model, after large clients secure resources, the resource pools for small and medium-sized clients may be squeezed. The entry barrier for AI startups is rising—projects that previously could launch with token-based pay-as-you-go pricing may now face higher initial costs. Meanwhile, tool-layer companies that help clients optimize PTU utilization—such as AI workload scheduling and cost management SaaS platforms—will encounter structural opportunities.

Guoxin Securities' research report points out that the pricing model transformation is at a turning point, and once the impact of the transformation is fully absorbed over the next year, the trend toward long-term contracts for orders will significantly enhance the certainty of revenue and profit growth.

03 Conclusion

The transition from Token to PTU marks the AI industry's shift from the experimental phase of unregulated growth to the commercial maturity phase of careful development.

Looking back at the evolution of mobile internet from billing by KB to unlimited data, it was the maturation of pricing models that gave rise to trillion-dollar markets such as short videos, live streaming, and cloud gaming. Today’s pain points in AI billing are paving the way for the next “Douyin-level” AI-native application.

Of course, PTU won't be the end. With the rise of Model-as-a-Service (MaaS), AI billing may further evolve toward "paying based on business outcomes." The struggle over pricing power remains the central thread for observing the evolution of the AI industry.

In this process, the true winners will be those who shift from “locking in customers” to “serving customers”—only when pricing power moves from a zero-sum game to a positive-sum win-win scenario will the rite of passage for AI commercialization be truly complete.

This article is from the WeChat public account "Market Cap Rank" (ID: shizhibang2021), authored by the Market Cap Rank team.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.