Logro chino de IA: Modelo DeepSeek de 1,6 billones de parámetros completamente entrenado en Ascend 910C nacional

ME AI message, according to monitoring by Beating, a joint攻关 team composed of Shenzhen Hekou University, Harbin Institute of Technology (Shenzhen), Shenzhen Big Data Research Institute, and Huawei-related teams, in collaboration with the Shenzhen Smart City AI compute platform, has successfully completed full-parameter post-training of the 1.6-trillion-parameter large model DeepSeek-V4-Pro on a domestic AI compute platform. This marks the first time a third-party organization globally has accomplished full-parameter post-training of a 1.6-trillion-parameter model on a domestic compute platform. Compared to pre-training from scratch, the post-training phase (primarily involving supervised fine-tuning SFT and reinforcement learning RL) focuses on teaching the model to follow instructions and perform specific tasks through high-quality directives and alignment with human preferences. However, full-parameter post-training for a 1.6-trillion-parameter MoE architecture model still imposes extremely demanding requirements on underlying hardware, including GPU memory capacity, inter-card communication bandwidth (such as all-to-all communication triggered by MoE routing), and stability of large-scale clusters. Leveraging a Huawei Ascend 910C compute cluster comprising over a thousand chips, the joint攻关 team successfully overcame communication bottlenecks by optimizing distributed workload distribution and load-balancing strategies. Throughout more than 1,500 training steps, the system experienced no interruptions, achieving a model FLOPs utilization (MFU) exceeding 30% and a 14% improvement in key operator efficiency—all metrics meeting industrial-grade operational standards. Industry analysts note that the successful execution of trillion-parameter model training on Huawei Ascend 910C clusters confirms the technical feasibility of domestic AI chips in handling deep training tasks for ultra-large-scale models. Previously, core pre-training for large models relied heavily on NVIDIA GPU clusters, while domestic compute platforms were primarily used for inference or small-parameter fine-tuning. The success of this joint攻关 signifies that China’s domestic compute ecosystem is rapidly transitioning from “supporting only inference” to achieving a technical closed loop capable of full-parameter training for ultra-large-parameter models. (Source: MLion)