MiniMax Launches the MSA Sparse Attention Method and the MiniMax-M3 Model

iconKuCoinFlash
Share
AI summary iconSummary
ME AI message: MiniMax has released MSA (MiniMax Sparse Attention), a sparse attention method built on Grouped Query Attention. It decomposes attention into an index branch and a main branch: the index branch selects 16 token blocks per GQA group at block granularity (default 128 tokens), using a fixed budget of 2,048 key-value tokens; the main branch performs precise softmax attention only on these selected blocks. MSA was trained on a 109B-parameter MoE model, and MiniMax has open-sourced the inference kernel `fmha_sm100` for NVIDIA SM100 GPUs (under MIT license, supporting BF16/FP8/NVFP4/FP4), along with the production model MiniMax-M3. MSA-PT achieves scores of 67.2, 77.7, 64.0, 84.2, and 77.5 on MMLU, GSM8K, HumanEval, RULER-8K, and RULER-32K respectively, matching the full attention baseline. With a 128K context length, its exp-free Top-k selection is 5.1x faster than `torch.topk`. (Source: AiHot)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.