DeepSeek V4 has finally launched. This is a moment that has been awaited for nearly five months. The flagship MoE model with 1T parameters, along with a 285B-parameter Flash version, is now available, followed closely by the full 1.6T Pro version—all fully open-sourced on GitHub under the Apache 2.0 license, with weights and deployment code released simultaneously.
As soon as the model was released, the capital market responded with three interrelated yet independent approaches.
Different market reactions in the capital markets
Almost all stocks on the A-share computing power chain surged sharply. Cambricon posted 11 consecutive trading days of gains, rising 3.7% today and accumulating over 60% in gains this month. Higon Information hit the 10% daily trading limit during the session and closed up 8.4%. SMIC’s A-shares rose 4.91%, while its Hong Kong shares gained 8.81%. Hua Hong’s Hong Kong shares peaked at +18% and closed up 12%. The CSIC Semiconductor ETF attracted RMB 2.4 billion in net inflows today, reaching a record-high asset size.
The situation for Hong Kong-listed large model companies is another story. Zhipu (02513.HK) fell 8.07% with a short interest ratio of 9.9%. MiniMax (00100.HK) dropped 7.40%, and its short interest ratio surged to 22.87%—the highest single-day short selling volume in the Hong Kong AI sector over the past three months. Both companies are representative of the upcoming wave of AI listings on the Hong Kong stock exchange in the second half of 2025, and their IPO prospectuses list the same core competitive advantage: "self-developed foundational large models."

Reactions on the other side of the Pacific were equally specific. NVIDIA opened down 1.8% on April 24, dipped as low as -2.6% intraday, and closed flat. Bloomberg’s market snapshot compared this consolidation to the V3 “DeepSeek moment” on January 27. The difference: the January event was a panic-driven sell-off that erased $600 billion in market value in a single day. This time, it resembled a gentle repricing—modest in scale but clear in direction. Institutional buy-side research notes introduced a new phrase: “China’s AI inference demand is beginning to decouple from North America’s AI inference demand.”
Stack these three dashboards together, and you have the market’s first verdict written within 24 hours of V4’s launch. After open source emerged victorious, capital began realigning—what matters now is no longer the model itself, but which GPU it runs on and which supply chain it’s embedded in.
30 days, 11 new models—V4 ignites the open-source community
The time window for the V4 release was itself part of what amplified this response.
Zoom out to the past 30 days. Between March 26 and April 24, at least 11 major large models with significant global impact were released or underwent major updates, covering nearly all key players: Anthropic Opus 4.6, Google Gemini 3.1 Pro, OpenAI GPT-5.5, Mistral Large 3, Meta Llama 4, Moonshot Kimi K2.6, Alibaba Qwen3-Next, ByteDance Doubao 2.5 Pro, Tencent HunYuan 3.0, Kimi K2.6 Plus, and finally DeepSeek V4, released on the early morning of April 23.
On average, a new model emerges every 2.7 days—faster than fund managers can even read the press releases. Yet, after reviewing the 30-day K-line of AI assets in China and Hong Kong, only one name left a lasting mark on the chart. On April 8, GPT-5.5 propelled NVIDIA to a 4.2% single-day gain, reaching its peak for the day. Then came DeepSeek V4 on April 23–24, triggering consecutive gaps upward in the China-Hong Kong computing power chain.

The difference does not lie in the models' inherent capabilities. The gap between these 11 models on the LMArena leaderboard is typically no more than 50 points, placing them within a narrow band of similar ranking tiers. The difference lies in the combination of two factors.
The first is open source. Among the top 10 models, only Llama 4 is open source, but its weight license comes with a long list of commercial restrictions, leading to lukewarm reception from the European and American developer community—it dropped out of the top 10 on OpenRouter by its third day. V4 uses the Apache 2.0 license, with no barriers to weights, no commercial restrictions, and synchronized release of inference code. This is the first flagship open-source model in the past six months to simultaneously pressure closed-source models on performance, price, and openness.
The second point is timing. Amid a series of powerful moves by the closed-source camp, the open-source narrative has been repeatedly squeezed. Opus 4.6 pushed the code task benchmark SWE-Bench to a new high, while GPT-5.5 set a downward anchor price of $1.25 per million tokens. The debate over whether open-source can catch up to closed-source has raged in Silicon Valley for two years. V4, with an estimated 90 million monthly active users, has paused this debate as the flagship of open-source.
According to a leading domestic fund manager during a roadshow, “Before V4, we applied a discount to the valuation of open-source large models; after V4, this discount began to reverse.”
DeepSeek has updated the pricing table for its computing power supply chain.
In the V4 release notes, a line appeared that had never before been included in any official documentation from a Chinese large model: “On Day 0, fully adapted to Cambricon MLU590 and Huawei Ascend 950PR, with deployment code simultaneously open-sourced.” The weight of this line can only be understood by connecting three parallel threads that unfolded over the past 12 months—each belonging to hardware, software, and Silicon Valley’s response.
The first hidden line lies on the chip side. Huawei’s Ascend 950PR officially entered mass production in December 2025, delivering 1.56 PFLOPS of FP4 compute power and 112 GB of HBM capacity—the first time a domestically developed AI chip has matched NVIDIA’s B-series on key hardware metrics. In V4-style MoE inference tasks with 1 trillion parameters, single-card throughput is 2.87 times higher than the H20. The accompanying CANN 8.0 software stack optimizes LLM inference frameworks down to the operator level; DeepSeek’s publicly released benchmarks show that the V4 on an Ascend super-node (8x 950PR chips) achieves 35% lower end-to-end inference latency than an equivalent H100 cluster. Cambricon’s MLU590 is even more aggressive, offering FP8 performance comparable to the H100 at less than half the price.
The second hidden thread lies on the software side. On April 22, vLLM merged the Cambricon MLU backend PR, marking the first time an open-source inference framework natively supported a non-NVIDIA domestic GPU. Hygon’s DCU takes a different path through the ROCm ecosystem but can fully run V4’s MoE routing layer. This means V4 deployment is no longer limited to “running only on one specific domestic GPU,” but instead offers “a choice among multiple domestic GPUs.” The ecosystem’s dependence on a single vendor has been broken—this is a critical turning point for production.
The third thread originates from Silicon Valley. On April 15, Huang Renxun was pressed by analysts at TSMC’s earnings call about the progress of China’s domestic computing power; his response was cold and precise: “If they truly manage to free LLMs from CUDA, it would be a disaster for us.” Nine days later, DeepSeek provided an answer with a single Day 0 announcement.

The four words “domestic replacement” have been overused over the past three years to the point of losing meaning. But after April 24th morning, for the first time, there were concrete, market-pricable data points: single-card throughput, end-to-end inference latency, inference cost, and production-ready deployment code—quietly pushing this long-standing rhetorical battle past the threshold into production.
The logic behind Cambricon's 11-day consecutive price rise is here. It is no longer just a "domestic GPU concept stock," but a "DeepSeek V4 inference infrastructure provider." The same logic explains Hua Hong's 12% rally in Hong Kong—its manufacturing uses a 7nm-equivalent process for the 950PR. Every V4 token running on China’s Ascend chips represents capacity that would otherwise have flowed to NVIDIA and TSMC, now partially retained within the Pearl River Delta.
The next step has already been planned. According to Huawei’s roadmap, the 950DT (training version) is scheduled for delivery in the fourth quarter of 2026, with the goal of achieving full-stack training for V5 or equivalent-scale models on a 10,000-GPU cluster. If this path succeeds, CUDA’s competitive advantage in large model training in China will shift from “essential” to “optional.”
Source:律动 BlockBeats
