Sand.ai secures over $100 million in funding and plans to launch an open-source MoE video model in July 2026.

iconKuCoinFlash
Share
AI summary iconSummary
ME AI News, according to monitoring by Beating, video generation large model company Sand.ai (founded in January 2024) has announced the completion of two funding rounds totaling over $100 million. Investors include Look Capital, Lollapalooza Capital (Wang Huiwen’s family office), Jiukun Venture Capital, Matrix Partners China, MSA Capital, Sinovation Ventures, Source Code Capital, IDG, Baidu Ventures, and other leading institutions. Starhan Capital served as the financial advisor for this round. Sand.ai’s founder, Cao Yue, stated in an interview that the team has consistently pursued the non-consensus autoregressive video generation approach rather than the mainstream Diffusion route. Their previously released Magi-1 model remains ranked first on Google DeepMind’s Physics-IQ physical realism benchmark. To break through the “cost, speed, quality” trilemma in video generation, Sand.ai shifted last year to explore the MoE (Mixture of Experts) architecture and plans to release a next-generation MoE-based video generation model in Q3 2026 (July), combining efficient inference with the largest parameter scale currently available in open-source models—and will open-source the model. On the commercialization front, Sand.ai employs a dual-driver strategy of models and products. Its music Agent product, VidMuse, launched in January this year, achieved $10 million in annual recurring revenue (ARR) within just two months. Additionally, its open-source MagiAttention operator library is now used by nearly all multimodal model teams in China and has received official endorsement from NVIDIA. Regarding the industry’s heated discussion on the “world model” concept, Cao Yue believes it is still in the pre-GPT era (before GPT-1), with neither data nor approaches having converged. He emphasized that video is the most critical data modality toward achieving world models and argued that models should autonomously learn physical laws by predicting raw video observations (pixels/frames), rather than introducing human priors to explicitly model state variables. (Source: BlockBeats)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.