Nucleus-Image Open-Sourced with 17B Parameters, 2B Activated per Inference

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On April 16 (UTC+8), Nucleus AI open-sourced the Nucleus-Image model under the Apache 2.0 license. Built on MetaEra, the model features a sparse MoE diffusion transformer with 17 billion parameters, but only 2 billion are active during inference to reduce costs. It outperformed or matched leading closed-source models on three benchmarks without post-training. This move aligns with growing interest in risk-on assets and CFT initiatives within global crypto markets.

ME News reports that on April 16 (UTC+8), according to monitoring by Beating, the Nucleus AI team released the text-to-image model Nucleus-Image, simultaneously open-sourcing the model weights, training code, and training dataset under the Apache 2.0 license, which permits commercial use. The model employs a sparse Mixture-of-Experts (MoE) diffusion transformer architecture, with a total of 17B parameters distributed across 64 routing experts per layer, activating only approximately 2B parameters during inference—significantly reducing inference costs compared to dense models of similar scale. On three standard benchmarks, Nucleus-Image matches or exceeds leading proprietary models: it achieves a GenEval score of 0.87, tying with Qwen Image, and ranks first among all compared models in the spatial positioning subtask (0.85); it scores 88.79 on DPG-Bench, placing first overall; and it achieves a score of 0.522 on OneIG-Bench, surpassing Google’s Imagen4 (0.515) and Recraft V3 (0.502). All these results were achieved using pure pre-training without DPO, reinforcement learning, or human preference tuning. Nucleus AI officially describes this as “the first fully open-source MoE diffusion model at this quality level.” The training data was大规模 scraped from the web, filtered, deduplicated, and scored for aesthetics, retaining 700 million images and generating 1.5 billion text-image pairs. Training proceeded in three stages, progressively increasing resolution from 256 to 1024, totaling 1.7 million steps. The text encoder uses Qwen3-VL-8B-Instruct, invoked via the diffusers library, and incorporates cross-denoising-step text KV caching to further reduce inference overhead. For developers seeking to deploy image generation locally, the design of 17B total parameters with only 2B activated makes it feasible to run on consumer-grade GPUs. Full open-sourcing—including weights, training code, and dataset—is rare; most open-source image models release only weights, keeping datasets and training details proprietary—a major bottleneck for reproducible research in text-to-image generation. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.