DeepSeek V4 Technical Specifications Revealed: 1.6T Parameters, 384 Experts with 6 Activated

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On April 22 (UTC+8), Princeton PhD student Yifan Zhang shared technical details about DeepSeek V4 on X. The model features 1.6 trillion parameters, 384 MoE experts with six activated per step, and a 285B parameter V4-Lite variant. Training utilized the Muon optimizer, a 32K pre-training context length, and a final context length of 1M. DeepSeek has not issued a comment. Market sentiment remains mixed, with the Fear & Greed Index indicating moderate uncertainty.

ME News reports that on April 22 (UTC+8), according to monitoring by Beating, Princeton PhD student Yifan Zhang updated technical details of DeepSeek V4 on X. On April 19, he previewed “V4 next week” and listed three architectural components; tonight, he released the full parameter table and disclosed for the first time the existence of a lightweight variant, V4-Lite, with 285B parameters. The total parameter count of V4 is 1.6T. The attention mechanism is DSA2, combining DeepSeek’s previously used DSA (DeepSeek Sparse Attention) from V3.2 and the newly proposed NSA (Native Sparse Attention) from this year’s paper, with a head dimension of 512, paired with Sparse MQA and SWA (Sliding Window Attention). The MoE layer consists of 384 experts, with 6 activated per token, utilizing a Fused MoE Mega-Kernel. Residual connections continue to employ Hyper-Connections. Training-related details revealed for the first time include: the optimizer is Muon (a matrix-level optimizer applying Newton-Schulz orthogonalization to momentum updates), a pretraining context length of 32K, and GRPO used in the reinforcement learning phase with KL divergence correction. The final context length has been extended to 1M. The model is text-only. Zhang is not affiliated with DeepSeek, and DeepSeek has not responded to the above information. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.