DeepSeek V4 open-source model launches with 1.6 trillion parameters and MIT license

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On-chain news broke on April 24 (UTC+8) as DeepSeek launched the preview of its open-source V4 series models under the MIT license. The V4-Pro and V4-Flash MoE models feature 1.6 trillion and 284 billion parameters, respectively, with support for 1 million token contexts. V4-Pro reduces inference FLOPs by 73% and KV cache memory usage by 90% compared to V3.2. Weights are available on Hugging Face and ModelScope. New token listings may benefit from the enhanced efficiency and open access.

ME News reports that on April 24 (UTC+8), according to monitoring by Beating, DeepSeek has released the preview version of its V4 series under the MIT license, with weights now available on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro, with a total of 1.6T parameters and 49B (49 billion) activated per token; and V4-Flash, with a total of 284B (284 billion) parameters and 13B (13 billion) activated per token. Both support a 1M token context length. Three architectural upgrades: A hybrid attention mechanism (Compressed Sparse Attention CSA + Heavily Compressed Attention HCA) significantly reduces long-context overhead—under 1M context, V4-Pro’s single-token inference FLOPs are only 27% of V3.2’s, and KV cache (GPU memory used to store historical information during inference) is just 10% of V3.2’s; Manifold-constrained Hyperconnection (mHC) replaces traditional residual connections to enhance cross-layer signal propagation stability; training now employs the Muon optimizer for faster convergence. Pre-training data exceeds 32T tokens. Post-training occurs in two stages: first, domain-specific experts are trained using SFT and GRPO reinforcement learning; then, online distillation unifies them into a single model. V4-Pro-Max (highest inference intensity mode) claims to be the strongest open-source model currently available, achieving top-tier performance on coding benchmarks and significantly narrowing the gap with closed-source state-of-the-art models in reasoning and agent tasks. V4-Flash-Max delivers reasoning performance close to V4-Pro when given sufficient thinking budget, but is limited by parameter scale on pure knowledge and complex agent tasks. Weights are stored in FP4+FP8 mixed precision. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.