DeepSeek V4 Series Released with 1.6 Trillion Parameters and MIT License

iconChainthink
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On-chain news broke on April 24 as DeepSeek released the V4 series under the MIT license. The models, now available on Hugging Face and ModelScope, include V4-Pro (1.6 trillion parameters) and V4-Flash (284 billion parameters), both supporting a 1 million token context window. The V4 series introduces three architectural enhancements, including a hybrid attention mechanism that reduces long-context computation costs. V4-Pro uses only 27% of the FLOPs and 10% of the KV cache memory required by V3.2. Trained on over 32 trillion tokens, the models leverage SFT, GRPO, and online distillation. Improved efficiency may benefit new token listings.

ChainThink reports that on April 24, according to official information, DeepSeek has released the preview version of its V4 series under the MIT license, with model weights now available on Hugging Face and ModelScope.


This series includes two MoE models, with the V4-Pro boasting a total of 1.6 trillion parameters and activating 49 billion parameters per token;


V4-Flash has a total of 284 billion parameters, with 13 billion parameters activated per token. Both models support a context length of 1 million tokens.


This series features three upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavily Compressed Attention HCA) that significantly reduces long-context overhead; under a 1M context scenario, the V4-Pro requires only 27% of the FLOPs per token and 10% of the KV cache memory compared to V3.2.


Manifold-constrained hyperconnection (mHC) replaces traditional residual connections to enhance the stability of cross-layer signal propagation; training is accelerated using the Muon optimizer. The model was pre-trained on over 32 trillion tokens.


Post-training consists of two stages: first, domain-specific expert models are trained separately using SFT and GRPO reinforcement learning, then unified into the final model through online distillation.


V4-Pro-Max claims to be the strongest open-source model currently available, achieving top-tier performance in coding benchmarks and significantly narrowing the gap with closed-source frontier models in reasoning and agent tasks;


After acquiring sufficient thinking budget, V4-Flash-Max achieves reasoning performance close to Pro, but is limited by its parameter scale in pure knowledge and complex agent tasks. Model weights are stored in FP4+FP8 mixed precision.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.