China’s AI lags behind the U.S. in access to high-end training chips and computing power.

Hashrate constraints

Since the end of last year, domestic GPU companies such as Moore Threads, Muxi Semiconductor, Biren Technology, and TianShu Intelligent Chip have sparked a wave of capital interest. However, beneath the wealth bonanza in the secondary market, an overlooked undercurrent is becoming increasingly clear, and the issues it raises are growing more urgent.

Over the past few years, domestic AI chips in China have primarily focused on the relatively secure and more peripheral "inference side." For example, recently, Doubao plans to purchase 50,000 chips from TianShu Intelligence for inference tasks to meet the high-frequency demands of China's largest AI app platform.

In the realm of AI training, which sits at the top of the computing power hierarchy, domestic chips are currently limited to peripheral, auxiliary tasks.

AI training chips are primarily used for training artificial intelligence models, which involve extensive matrix operations and parameter adjustments, requiring powerful computational capabilities and high energy efficiency. These chips offer superior performance and come at a premium price, such as NVIDIA’s A100, H100, H200, and AMD’s MI300 series.

In comparison, inference chips have a much lighter workload. Used in the deployment phase after model training, they are primarily responsible for executing model inference tasks. These chips require high real-time performance and must deliver fast response times and low power consumption while maintaining accuracy.

A fitting analogy is that training enables an AI model to "learn knowledge," while inference enables the large model to "apply knowledge." During the learning phase, training chips must process massive datasets to dynamically update parameters in the billions, trillions, or even tens of trillions. This requires not only powerful computational capacity but also high-efficiency bandwidth and communication capabilities, as well as stability across clusters of tens of thousands of chips.

The root of the gap between Chinese and U.S. models lies in these "invisible" areas, particularly the absence of high-end training chips.

Under the scaling laws of large models, as model parameters increase, computational requirements grow linearly, and the exponentially rising costs of computing power and hardware make training large models a game reserved for only a few tech giants.

Among U.S. tech giants, Meta alone plans to deploy over 1.2 million high-end GPUs by the end of 2026, with annual investments exceeding $145 billion; according to estimates, Google’s total AI computing power is equivalent to 5 million NVIDIA H100 chips, accounting for one-quarter of the global total.

Amazon, Microsoft, Alphabet, and Meta are projected to spend a staggering $725 billion on capital expenditures this year, a 77% year-over-year increase—equivalent to 13% of the United States’ total private domestic investment for the year. Morgan Stanley further predicts that U.S. tech companies’ capital spending could reach a record $1.1 trillion by 2027.

The United States currently controls over 70% of the world’s high-end GPUs; after the chip export restrictions, domestic access to high-end chips is only one-eighth of that in the U.S. According to the Stanford AI Index Report 2026, the number of data centers in the United States (5,427) is more than ten times that of China.

According to calculations by the China Academy of Information and Communications Technology (CAICT), as of early 2025, the U.S. computing power capacity stood at 2,400 EFLOPS, while China's was 1,053 EFLOPS—more than double China's.

The computing power held by each of the four tech giants listed above exceeds the combined computing power of all AI companies in China.

This overwhelming computational advantage enables U.S. companies to complete more than a dozen large model iteration experiments within a year.

Elon Musk has gone even further—his company xAI owns Colossus 2, touted as the world’s first GW-class AI cluster. This gives him the confidence to claim that he is simultaneously training seven models: two with one trillion parameters, two with 1.5 trillion, one with six trillion, and one with ten trillion parameters. Such a display of brute-force power is only possible with extremely abundant computational resources.

Meanwhile, due to U.S. restrictions on chip exports, the share of high-end AI chips acquired by Chinese companies in shipments over the past few years has continued to decline (according to Epoch.AI).

It is no exaggeration to say that the vast gap in computing power will cause China’s AI to remain in a catching-up phase for the long term and make it even more difficult for domestic large models to catch up with their U.S. counterparts.

Generational difference

China's pace of innovation is unstoppable. Anyone who thinks China cannot produce (chips) is seriously mistaken. The gap between China and the U.S. is only on the nanosecond level.

NVIDIA founder Jensen Huang has repeatedly praised China's advancements in semiconductors in public appearances.

Elon Musk also frequently expresses similar views on X: “China will definitely solve its chip supply constraints and will far surpass all other countries in AI computing power,” and “China will win the global AI race.”

Tech giants who are highly respected in the industry have lavished praise on China’s AI development, making it easy for people to believe them. These statements clearly carry the suspicion of excessive flattery. Some U.S. media outlets continuously promote the narrative that the gap between Chinese and American models is minimal, attempting to confuse the facts and obscure certain objective truths.

In response, all domestic AI-related fields should remain clear-headed and calm.

If China's advanced large models today show little difference from their U.S. counterparts in solving standardized problems, the gap becomes much more apparent in complex industrial and enterprise environments.

Compared to cutting-edge models from companies like Anthropic in the U.S., China remains a follower. According to the U.S. CAISI assessment, China’s strongest model, DeepSeek V4 Pro, lags behind U.S. frontier models by approximately eight months.

Li Ka-shing recently told The Wall Street Journal that, with top U.S. models like Anthropic’s Claude 3.5 as the benchmark, the United States is currently about 15 months ahead of China.

Large models follow the Scaling Law: the larger the number of parameters, the more training data, and the greater the computational resources invested, the better the model's performance. Today, the most advanced large models in the United States have entered the era of ten trillion parameters, and their iteration speed continues to accelerate.

Anthropic’s most powerful Mythos model has reached 10 trillion parameters, requiring $10 billion to train; xAI’s Colossus 2 is currently training seven models simultaneously, including 6-trillion- and 10-trillion-parameter models; OpenAI can iterate through a 4-trillion-parameter model in just one month.

China's most powerful model, DeepSeek V4 Pro, has a total of 1.6 trillion parameters, approximately six times fewer than the ten-trillion-parameter frontier models in the United States.

Anthropic's Claude series has been widely recognized as the strongest AI programming large model over the past two years, and Mythos has once again surpassed public expectations, demonstrating even greater performance than the previous flagship, Opus 4.6.

OpenBSD is renowned in the industry as the most secure system, yet Mythos discovered a vulnerability that had gone undetected for 27 years. It also found vulnerabilities in FFmpeg and the Linux kernel that had remained unnoticed for years or even decades—all autonomously, without human assistance.

Keep in mind that the pre-training of large models determines their upper performance limit—post-training cannot elevate a trillion-parameter model to match the capabilities of a ten-trillion-parameter model. The key factor in pre-training is high-end computing chips, which determine both the parameter scale and the speed of training iterations.

Liu Qingfeng, Chairman of iFlytek, frankly admitted that leading large model manufacturers, especially U.S. giants, are all building ultra-large-scale computing platforms. Meanwhile, domestic computing power is currently experiencing a challenging phase, leading to limitations in training on extremely long text contexts.

Clearly, the disparity in computing power is the root cause of the difference between Chinese and U.S. models.

Domestic Rise

A single company controls 90% of the global market for high-end AI training chips—enabling NVIDIA to maintain its position as the world’s most valuable company. At its peak, its total market capitalization exceeded Germany’s 2025 GDP, the world’s third-largest economy.

According to data from TrendForce, in Q1 2026, NVIDIA accounted for 68% of the global GPU server market, AMD held 5%-6%, and domestic GPU manufacturers collectively accounted for less than 4%.

Leveraging its first-mover advantage, formidable technological barriers, high-speed interconnectivity, robust software ecosystem, and close partnership with TSMC’s advanced processes, NVIDIA dominates the market. In high-end training scenarios, NVIDIA’s GB300 outperforms AMD’s MI325, as well as Cambricon’s Siyuan 690 and Moore Threads’ MTT40—particularly in training trillion-parameter large models, where it exceeds competitors by more than 30%.

Under the export ban, Huang Renxun previously stated that NVIDIA’s new market share in China has essentially dropped to zero, leaving only the existing market. Supported by domestic substitution policies, companies such as Huawei’s Ascend 910, Hygon DCU Shen Suan No. 2, Cambricon MLU370/590, as well as Moore and Moxi, have emerged successively.

The Ascend 910 is Huawei's most powerful computing chip, with the Ascend 910B delivering 640 TOPS (INT8) performance, comparable to NVIDIA's A100 chip.

In terms of absolute performance, domestic GPUs still lag behind, but they can initially focus on inference and edge computing scenarios. Currently, domestic GPUs largely meet the general inference needs of domestic government and enterprise clients, with the performance gap compared to NVIDIA’s mid-range products narrowed to 15%-20%, making substitution feasible.

It should be emphasized that while computational performance is important, the underlying software ecosystem is the weak point of domestic GPUs. As Academician Zheng Weimin of the Chinese Academy of Engineering pointed out, the core issue with domestic AI chips is their inadequate ecosystem; if the ecosystem were strong, even 60% of the performance would still attract users.

The software ecosystem represents the most formidable barrier in the GPU space, and NVIDIA’s capabilities in this area are equally irreplaceable.

After more than a decade of dedicated development, the CUDA ecosystem now boasts over 4 million developers, hundreds of thousands of open-source models, and a comprehensive suite of third-party tools, covering AI training, inference, graphics rendering, and scientific computing—establishing an unmatched ecosystem advantage.

According to IDC data, more than 95% of AI models worldwide are developed on the CUDA ecosystem. Domestic GPUs, supported by policy initiatives, require long-term collaboration with the industrial chain, as well as sufficient patience from the media and capital markets.

In January this year, Zhipu, in collaboration with Huawei, open-sourced the next-generation image generation model GLM-Image. The model completed an end-to-end workflow from data processing to model training using Huawei Ascend Atlas 800T A2 devices and the Ascend MindSpore AI framework, making it the first SOTA multimodal model trained entirely on domestic chips.

Moore Threads, in collaboration with the Beijing Academy of Artificial Intelligence, completed the full-process training of the self-developed embodied intelligence model RoboBrain 2.5 using the MTT S5000 AI computing cluster and the FlagOS-Robo framework. This achievement marks the first validation of the feasibility of domestic computing clusters for training large models in embodied intelligence.

It is evident that domestic GPUs have made significant progress in compatibility and ecosystem development, moving from isolated breakthroughs in inference to gradual adaptation in training—a substantial advancement.

Summary

Overall, in the context of restricted imports of advanced overseas chips, it is advisable to adopt a hybrid approach—leveraging both domestic and international resources—while prioritizing support for domestic computing chips to meet urgent market demands.

The authenticity of the demand is unquestionable; the "bubble theory" still exists, but its voice has not grown louder. Global market enthusiasm for AI development has surpassed that of any previous industry in its early stages.

This year, global capital markets have once again entered a super AI cycle, with stocks of Samsung, SK Hynix, Broadcom, and TSMC reaching new all-time highs. In the domestic market, hard tech companies such as Cambricon have also seen strong price gains, with the optical module leader InnoLight surpassing Kweichow Moutai in market capitalization at one point.

Reviewing the history of South Korea's semiconductor industry, South Korea mobilized national resources to support its memory chip sector, endured its darkest moments, and ultimately surpassed Japan to become the absolute global leader in memory manufacturing.

Whether it’s memory chips, smartphone chips, or today’s AI chips, China is still in the process of catching up—a feat that cannot be achieved overnight. However, with its vast market, a growing pool of AI talent, and substantial capital resources, domestic GPUs are beginning to demonstrate promising compatibility and are capable of addressing many real-world needs of AI companies.

In this AI chess game over national destiny, China and the United States are both rivals and possess technologies, markets, and resources that each needs from the other.

This article is from the WeChat public account: Ju Tao WAVE. Edited by Yang Xuran. Authored by Xie Zefeng. Original title: "The Computing Power Challenge Amid the U.S.-China AI Contest | Ju Tao"