Chinese AI Models Slash Costs, Outpace US Competitors in Training and Inference Efficiency

DeepSeek trained its V3 model for roughly $5.58 million. For context, US competitors routinely spend tens to hundreds of millions on frontier-level models.

In May 2026, DeepSeek permanently slashed prices on its V4-Pro model by 75%. Cached input costs dropped to as low as RMB 0.025 per million tokens.

DeepSeek isn’t alone in this race to the bottom. Chinese firm 01.ai reportedly offers inference at approximately 14 cents per million tokens, positioning Chinese API pricing as the lowest in the world.

Chinese AI models on OpenRouter have achieved 5x growth in volume, driven almost entirely by their cost advantages over US alternatives.

How they’re doing it

Chinese developers have built sparse MoE architectures that reduce parameter activation from 671 billion down to just 37 billion. That translates to compute cost reductions of 90-97% at the inference layer.

Beyond architecture, Chinese teams have embraced lower-precision training methods like FP8, which reduces the computational demands of each individual calculation.

DeepSeek’s R1 reasoning model was trained for just $294,000, using 512 H800 chips over 80 hours.

Born from restriction

Since 2023, US export controls have restricted Chinese companies’ access to high-end Nvidia hardware. The H100 and its successors are effectively off-limits. Chinese developers have been working with the H800, a downgraded chip designed to comply with export rules.

Major Chinese players spanning this efficiency frontier include Alibaba’s Qwen, Moonshot AI’s Kimi, Zhipu AI’s GLM, and ByteDance’s Doubao, alongside DeepSeek.

What this means for investors

If frontier-level AI performance is achievable at training costs under $6 million rather than $100 million-plus, the capital expenditure moat around US AI leaders starts looking thinner.

For the crypto and Web3 ecosystem, cheaper inference directly reduces the cost of running AI-powered decentralized applications, oracle networks, and on-chain analytics tools.

The 97% compute reductions that Chinese developers are achieving through sparse MoE architectures aren’t just technical milestones. They’re price signals, and markets eventually follow price signals.