Chinese AI model prices fall, widening cost gap with U.S. competitors

CoinDesk reports:

Foreign media report that Chinese frontier model providers have recently continued to lower their API prices, with DeepSeek and Xiaomi both announcing new pricing structures, while leading U.S. labs have moved toward higher pricing for their new models, further widening the gap in inference costs between Chinese and American frontier models.

For enterprise customers, model pricing is primarily reflected in API costs billed per token. After integrating the model, costs are incurred for inputs, outputs, and cache hits, so changes in unit price directly impact the commercial viability of AI products.

DeepSeek and Xiaomi lower prices simultaneously

On May 22, DeepSeek converted the previous 75% discount on V4-Pro into a permanent pricing structure. After the adjustment, the input price for this model is $0.435 per million tokens, and the output price is $0.87 per million tokens.

Xiaomi lowered the MiMo-V2.5 API pricing on May 26, reducing the input cost for the Pro version to $0.000036 per million tokens, with some projects seeing price cuts of up to 99%. Under Xiaomi’s new pricing plan, users can now receive 5 to 8 times more tokens for the same price.

The price reduction is driven by inference optimization.

The article states that this price reduction is not merely a marketing move. Luo Fuli, head of Xiaomi's MiMo team and a former core developer at DeepSeek, said the main savings come from optimizations in caching and the inference framework. The system can reuse more previously processed information, reducing redundant computations and thereby lowering storage and inference costs.

According to their statement, the optimizations significantly improved the cache token processing capacity, reducing overall storage and processing costs by approximately 80%. Under the new API pricing, the production inference engine can still break even even when operating near full capacity.

DeepSeek addresses this by compressing the computational cost of context through model architecture. The article notes that V4 employs two interleaved attention mechanisms, significantly reducing the KV cache size and lowering the per-token inference cost in long-context scenarios. For a context of one million tokens, V4-Pro’s KV cache is only about one-tenth the size of its predecessor’s, and its per-token inference cost is approximately 27% of the predecessor’s.

The U.S. model price movement is opposite.

The article notes that leading U.S. models have not recently followed suit with price cuts. OpenAI’s GPT-5.5, released at the end of April, increased its output price to $30 per million tokens, roughly double that of its predecessor. Anthropic’s Claude Opus 4.7 maintains its listed price, but due to an updated tokenizer, the same text may generate more tokens, potentially increasing actual bills by up to 35%.

In comparison, DeepSeek V4-Pro scores 80.6% on the SWE-Verified code benchmark, nearing Claude Opus 4.6’s 80.8%, yet its output pricing is tens of times lower. After its latest price adjustment, Xiaomi MiMo-V2.5-Pro now matches DeepSeek V4-Pro’s input and output pricing.

The article also notes that Chinese model providers such as MiniMax, Moonshot AI, and Z.AI have maintained low pricing levels, in addition to DeepSeek and Xiaomi. According to the comparison in the article, the price difference between leading Chinese and U.S. models in the second quarter of 2026 is approximately 15 to 30 times; when cache discounts are factored in, this gap widens further.