MiniMax Launches the MSA Sparse Attention Method and the MiniMax-M3 Model
KuCoinFlashShare
On-chain news reveals that MiniMax has unveiled MSA (MiniMax Sparse Attention), a sparse attention method built on Grouped Query Attention. The method splits attention into an index branch and a main branch, with the index branch selecting 16 token blocks per GQA group and the main branch performing precise softmax attention on those blocks. MSA was trained on a 109B parameter MoE model, and MiniMax open-sourced the `fmha_sm100` inference kernel for NVIDIA SM100 GPUs under the MIT license. The company also launched the production model MiniMax-M3, which matches full-attention baselines across multiple benchmarks. New token listings may benefit from these advancements in model efficiency and performance.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.