DeepSeek V4 and Meituan LongCat 2.0 Break the Trillion-Parameter Barrier

Domestic AI companies are beginning to chart their own paths.

At the start of this year, the global tech community has been closely watching China's computing power situation.

In January, Musk stated on a podcast that China would "far surpass the rest of the world" in AI computing power. In February, OpenAI’s CEO Altman said China’s technological advancements in artificial intelligence are "astonishingly fast." NVIDIA’s CEO Jensen Huang has also repeatedly stated publicly: "Restricting China’s AI technology will only accelerate its self-developed innovations."

2025 can be regarded as the year of consolidation on the supply side. Domestic GPU companies such as Moore Threads and Muxi Semiconductor have successively entered the capital market, further strengthening the industrial foundation for domestic large models. In 2026, these changes propagated downstream along the value chain, with multiple domestic large models releasing new versions in late April.

On April 20, Moonshot AI launched the Kimi K2.6 model, optimized for long-form code generation; on April 24, DeepSeek V4 was released; subsequently, Meituan’s LongCat-2.0-Preview opened for testing. Both models exceed one trillion total parameters and support ultra-long contexts of up to 1M tokens.

Notably, DeepSeek V4 has successfully migrated and adapted from the NVIDIA ecosystem to Huawei's Ascend platform; meanwhile, Meituan's LongCat2.0 is a trillion-parameter large model trained and inferred entirely on domestic computing power, utilizing 50,000 to 60,000 domestic computing chips.

For a long time, Chinese AI practitioners generally adopted the strategy of leveraging existing mature solutions. Now, domestic AI companies are beginning to forge their own paths.

Building roads in the wilderness

How do you accomplish a difficult task?

Science fiction author Arthur C. Clarke's answer was: "The only way to discover the limits of the possible is to go beyond them into the impossible."

DeepSeek V4 underwent multiple schedule adjustments from its initial planning to final release. Outside observers widely speculate that one reason was the need to migrate the core code away from NVIDIA's CUDA.

The CUDA ecosystem, after more than a decade of refinement, has become a powerful and fully equipped development platform. The domestic computing ecosystem is still in its early stages of development. Migrating code means that development teams must undertake extensive restructuring of underlying frameworks.

Ultimately, DeepSeek achieved this: two days after the release of V4, JPMorgan Chase noted in a report that V4 successfully adapted to Huawei's Ascend chips, validating the feasibility of domestic computing power in cutting-edge AI inference; furthermore, DeepSeek significantly reduced inference costs through foundational technological innovations such as its hybrid attention architecture.

DeepSeek achieves cost reduction and efficiency gains in a technically rigorous way, completing a demanding migration by rewriting half the workload of a large model. On the same day, Meituan’s LongCat-2.0-Preview was opened for testing, running directly on domestic computing hardware.

What are the engineering challenges of domestic computing power? Let’s examine them using LongCat-2.0-Preview as an example.

The first challenge is physical: the memory capacity and bandwidth of domestic hardware bases differ from NVIDIA chips, presenting significant engineering challenges for Meituan’s team when training and deploying trillion-parameter models, requiring substantial effort to fine-tune parallel strategies and optimize memory usage.

The second challenge is the maturity of the software ecosystem. To ensure precise and reproducible training throughout the process, tailored to the characteristics of domestic chips, the team needs to rewrite and optimize core operators and develop fully deterministic operators in-house.

The third challenge is the stability of a ten-thousand-GPU cluster. With a massive cluster utilizing 50,000 to 60,000 domestic AI accelerators, hardware failures are inevitable. To address this, the team has built a comprehensive fault-tolerance and automatic recovery system.

Finally, tailored for the characteristics of domestic hardware, the team designed the training framework and model architecture with optimized compatibility, overcoming the limitations of general-purpose frameworks and enhancing computational performance.

DeepSeek's algorithmic optimizations have lowered the barrier to computational power and reduced model costs; Meituan's engineering practices have demonstrated the feasibility of domestic chips. These efforts have also accumulated engineering capabilities and experience for China's chip ecosystem.

Liang Wenheng once said, “We didn’t set out to be a catfish—we just accidentally became one.” Now, the “catfish effect” is evident, and DeepSeek is not alone.

From a single point to a system

Tang Daosheng of Tencent Cloud once offered this analogy: “The large model is the engine, and the user is the driver.” Users easily notice the engine’s performance, but an excellent driver understands that fuel and the chassis are equally important.

The development of China's computing power relies on coordinated progress across the entire industrial chain, with core enterprises in each segment continuously addressing their weaknesses.

On the manufacturing side, public data shows that China’s chip production continues to rise, but it has a “dumbbell-shaped” structure, with mature processes above 28nm dominating, while capacity for advanced processes at 14nm and below remains scarce.

Faced with the absence of EUV lithography machines, companies such as SMIC and Hua Hong Semiconductor are advancing multiple patterning technologies to find a balance within physical limits. Multiple reports indicate that SMIC’s N+2 process (equivalent to 7nm) has achieved a yield exceeding 80%, signifying that it has crossed the threshold for commercial mass production.

On the computing power side, domestic chips still lag behind NVIDIA in single-card performance. Practices with products like Huawei Ascend 910C demonstrate that massive model training can still be achieved through extreme cluster linear acceleration ratios.

Whoever controls the ecosystem controls the world. A key reason NVIDIA’s CUDA has built such a deep moat is that it has established a universal standard for software and hardware compatibility.

Industry professionals are also aware of this. For example, Cambricon has launched a foundational software platform compatible with mainstream frameworks, lowering the barrier for developers to migrate. Led by the Beijing Academy of Artificial Intelligence, an open-source system has been developed to create a unified underlying interface, enabling upper-layer models to run on various domestic chips.

Major domestic internet companies are also taking action: Baidu’s dual-track strategy and ByteDance’s hundred-billion-yuan investment are both seeking better solutions for computing power infrastructure.

According to publicly available data, over the past few years, Meituan has invested in at least 21 companies operating in the semiconductor/intelligent hardware and general large model sectors. These include companies such as Moore Threads and Musen Technology in the chip computing power space, and Axera Technologies in the visual chip domain, as well as multiple enterprises in niche areas like new materials, including Guangzhou Zhongshan and Dongfang Suangxin.

While technical advancements continue to be actively monitored, industrial capital is also acting as an investor and co-builder in computing power, gradually forming a positive feedback loop.

From the digital world to real-world tasks

Artificial intelligence is currently at a critical turning point in its third wave, with large models driving it from narrow AI toward general AI. More importantly, they are propelling robots from the 1.0 era of specialized robots into the 2.0 era of general embodied intelligence.

Wang Zhongyuan, President of the Beijing Academy of Artificial Intelligence, emphasized that the key application of AI capabilities lies in the physical world.

On one hand, numerous domestic manufacturers are working to enable large models to “read ten thousand books” in the cloud, enhancing their intelligence and the rigor of their logical reasoning. On the other hand, they are also ensuring these models “travel ten thousand miles”—for example, the Wenxin large model has been integrated into autonomous driving decision systems, while the Hunyuan large model’s industrial quality inspection solutions are now deployed across multiple production line scenarios.

Meituan’s food delivery, in-store services, and travel and hospitality businesses form the most complex task execution network in daily life, encompassing countless real-world scenarios: from restaurant kitchen order fulfillment speeds, to riders navigating delivery routes in heavy rain, to a user’s late-night cry for hot pot.

Wang Xing has explicitly stated that the Meituan app should be the first to be upgraded into an "AI-powered app." This means that LongCat’s training goal is not only to answer questions like “Which restaurant serves the best stir-fried pork,” but also to “find that restaurant, select the best group-buy voucher, and book two seats for 7 p.m. on Friday.”

This underscores the importance of the effectiveness of task delivery and explains why Meituan emphasizes building an AI foundation for the physical world.

From parameter scaling to computational power optimization, domestic large models are advancing from “usable” to “highly usable.”

There are no shortcuts on this path. In the future, as algorithms, computing power, capital, and use cases continue to interact synergistically, China’s AI story will move from “point breakthroughs” to the chapter of “systemic evolution.”

This article is from the WeChat public account "Lan Dong Business," authored by Yu Weilin.