Chinese AI Milestone: 1.6T-Parameter DeepSeek Model Fully Trained on Domestic Ascend 910C

ME AI News: According to monitoring by Beating, a joint research team comprising Shenzhen Hekou University, Harbin Institute of Technology (Shenzhen), Shenzhen Big Data Research Institute, and Huawei-related teams, in collaboration with the Shenzhen Smart City AI Computing Platform, has successfully completed full-parameter post-training of the 1.6-trillion-parameter large model DeepSeek-V4-Pro on a domestic AI computing platform. This marks the first time a third-party organization globally has accomplished full-parameter post-training of a 1.6-trillion-parameter model on a domestic computing platform. Compared to pre-training from scratch, the post-training phase—primarily involving supervised fine-tuning (SFT) and reinforcement learning (RL)—focuses on teaching models to follow instructions and perform specific tasks through high-quality prompts and alignment with human preferences. However, full-parameter post-training for a 1.6-trillion-parameter MoE architecture model imposes extremely demanding requirements on underlying hardware, including GPU memory capacity, inter-card communication bandwidth (such as all-to-all communication triggered by MoE routing), and the stability of large-scale clusters. Leveraging a Huawei Ascend 910C computing cluster comprising over a thousand chips, the joint team successfully overcame communication bottlenecks by optimizing distributed workload distribution and load-balancing strategies. Throughout more than 1,500 training steps, the system experienced zero interruptions, achieving a model floating-point utilization (MFU) of over 30% and a 14% improvement in key operator efficiency—all metrics meeting industrial-grade operational standards. Industry analysts note that the successful execution of trillion-parameter model training on Huawei Ascend 910C clusters confirms the technical feasibility of domestic AI chips in supporting deep training tasks for ultra-large-scale models. Previously, core pre-training of large models heavily relied on NVIDIA GPU clusters, while domestic computing platforms were primarily used for inference or small-parameter fine-tuning. This breakthrough signifies that China’s domestic computing ecosystem is rapidly transitioning from “supporting only inference” toward achieving a complete technical闭环—enabling full-parameter training of ultra-large models. (Source: MLion)