NVIDIA open-sources the 550B Nemotron 3 Ultra model with a Mamba-Transformer hybrid MoE architecture.

icon MarsBit
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On June 4, 2026, NVIDIA open-sourced its 550B Nemotron 3 Ultra model, featuring a Mamba-Transformer hybrid MoE architecture. The model scored 48 on Artificial Analysis’ intelligence index, ranking second among open-weight models in the U.S. It supports a 1 million token context window with low memory usage and 5x higher throughput. The Agent Toolkit includes NemoClaw and OpenShell. On-chain analysis shows rising open interest in AI-driven trading tools. The model is available on Hugging Face, NVIDIA NIM, and OpenRouter.

According to Beating Monitoring, NVIDIA officially open-sourced its flagship large language model, Nemotron 3 Ultra, with 550 billion parameters and 55 billion activated parameters, on June 4, optimized for long-range agent tasks such as complex planning, reasoning, and tool use. On the third-party benchmark platform Artificial Analysis’s Intelligence Index, Nemotron 3 Ultra scored 48 points, making it the highest-performing open-weight model developed in the United States, just behind Kimi K2.6 from Moonshot AI, which scored 54 points. Technically, the model employs a Mamba-Transformer hybrid Mixture-of-Experts (MoE) architecture, alternating between Mamba-2 state space model layers and Transformer self-attention layers to avoid the quadratic memory growth bottleneck of KV caches under extremely long contexts, enabling a 1-million-token context window with minimal memory overhead. Compared to dense models of similar scale, this hybrid architecture improves throughput by 5x and reduces inference costs by 30% for agent tasks. In terms of ecosystem support, NVIDIA simultaneously released the Agent Toolkit, featuring the NemoClaw orchestration blueprint and OpenShell runtime. The open-sourced materials include the model weights, datasets, and training recipes. The model is now available on Hugging Face, NVIDIA NIM, and OpenRouter, with enterprise AI search provider Glean and others announcing integration as a commercial alternative to proprietary large models.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.