MiniMax open-sources sparse attention library for NVIDIA Blackwell, M3 weights to launch Friday

iconKuCoinFlash
Share
AI summary iconSummary
ME AI News: According to monitoring by Beating, Ryan Lee, Head of Developer Relations at MiniMax, announced that the high-performance attention library MiniMax Sparse Attention (MSA), optimized for NVIDIA Blackwell (SM100) GPUs, has been officially open-sourced under the MIT license. Ryan Lee also stated that MiniMax-M3 weights are expected to be released this Friday. MSA has been applied to MiniMax-M3’s million-token context inference, selectively computing attention only on the most relevant KV blocks within each GQA group. The paper shows that, compared to a dense GQA configuration under the same settings, MSA reduces attention computation by 28.4x on a 1M-token context and achieves 14.2x faster prefill and 7.6x faster decoding on H800 GPUs. The open-source version integrates both C++ JIT and CuTe-DSL implementations within a single Python package, offering both Dense FlashAttention and Sparse Top-k Attention kernels, with support for multiple precision formats including BF16, FP8, NVFP4, and FP4. It is currently primarily targeted at deployment on NVIDIA Blackwell (SM100) GPUs. (Source: BlockBeats)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.