Chinese AI Milestone: 1.6T-Parameter DeepSeek Model Fully Trained on Domestic Ascend 910C

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
A joint team comprising Shenzhen HeTao College, HIT (Shenzhen), the Shenzhen Institute of Big Data, and Huawei has completed full post-training of the 1.6-trillion-parameter DeepSeek-V4-Pro model on China’s domestic Ascend 910C AI platform. This on-chain announcement marks the first time a third-party group has achieved full training of a 1.6T-parameter model using over 1,000 Ascend 910C chips. The team improved model compute utilization by more than 30% and key operator efficiency by 14%, with zero system failures across 1,500+ training steps. The AI + crypto update underscores the growing strength of domestic AI capabilities and infrastructure.
ME AI News: According to monitoring by Beating, a joint research team comprising Shenzhen Hekou University, Harbin Institute of Technology (Shenzhen), Shenzhen Big Data Research Institute, and Huawei-related teams, in collaboration with the Shenzhen Smart City AI Computing Platform, has successfully completed full-parameter post-training of the 1.6-trillion-parameter large model DeepSeek-V4-Pro on a domestic AI computing platform. This marks the first time a third-party organization globally has accomplished full-parameter post-training of a 1.6-trillion-parameter model on a domestic computing platform. Compared to pre-training from scratch, the post-training phase—primarily involving supervised fine-tuning (SFT) and reinforcement learning (RL)—focuses on teaching models to follow instructions and perform specific tasks through high-quality prompts and alignment with human preferences. However, full-parameter post-training for a 1.6-trillion-parameter MoE architecture model imposes extremely demanding requirements on underlying hardware, including GPU memory capacity, inter-card communication bandwidth (such as all-to-all communication triggered by MoE routing), and the stability of large-scale clusters. Leveraging a Huawei Ascend 910C computing cluster comprising over a thousand chips, the joint team successfully overcame communication bottlenecks by optimizing distributed workload distribution and load-balancing strategies. Throughout more than 1,500 training steps, the system experienced zero interruptions, achieving a model floating-point utilization (MFU) of over 30% and a 14% improvement in key operator efficiency—all metrics meeting industrial-grade operational standards. Industry analysts note that the successful execution of trillion-parameter model training on Huawei Ascend 910C clusters confirms the technical feasibility of domestic AI chips in supporting deep training tasks for ultra-large-scale models. Previously, core pre-training of large models heavily relied on NVIDIA GPU clusters, while domestic computing platforms were primarily used for inference or small-parameter fine-tuning. This breakthrough signifies that China’s domestic computing ecosystem is rapidly transitioning from “supporting only inference” toward achieving a complete technical闭环—enabling full-parameter training of ultra-large models. (Source: MLion)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.