Tsinghua University and Mianbi open-source the world's first AI-coding pre-training framework, ForgeTrain
KuCoinFlash
Share
Summary
Tsinghua University and Mianbi open-sourced ForgeTrain, the world’s first AI-generated pre-training framework for AI + crypto news. The framework outperforms NVIDIA’s Megatron and achieves a 10% speed improvement on Huawei Ascend. It also generated MiniCPM5-1B, a top-performing compact model. The project underscores the potential for integrating real-world asset (RWA) news as AI tools evolve.
ME AI News: According to monitoring by Beating, Mianbi Intelligence and Tsinghua University’s NLP Lab have jointly open-sourced ForgeTrain, the world’s first production-grade large model pretraining framework entirely written by AI, along with MiniCPM5-1B, a small on-device model trained using ForgeTrain. As the first example demonstrating an engineering闭环 of “AI creating AI,” ForgeTrain outperforms NVIDIA’s Megatron under identical hardware conditions and achieves a 10% speedup during pretraining on Huawei Ascend. Meanwhile, MiniCPM5-1B has ranked #1 on the Artificial Analysis leaderboard for open-weight small models.
To enable AI to autonomously build foundational pretraining infrastructure, Mianbi Intelligence introduced the “Forge Engineering” software programming paradigm, abandoning generic frameworks designed to be compatible with all hardware and tasks. Instead, it leverages AI’s low-cost code generation capabilities to forge specialized code tailored to specific models and hardware. Architecturally, ForgeTrain employs a three-phase approach: first, it collects key data from existing pretraining frameworks to form a test harness; second, it iteratively generates binary-equivalent framework code within an automated feedback loop; finally, it removes constraints to surpass the reference implementation. This entire automated evolution corresponds to the L3 to L4 stages of “AI creating AI.”
As the first model produced by ForgeTrain, MiniCPM5-1B features 1.08 billion parameters and is built on the standard LlamaForCausalLM architecture, significantly lowering the barrier for downstream integration and inference deployment. In Artificial Analysis evaluations, the model scored 18 points, surpassing the 2B-scale Qwen3.5-2B (16 points), and outperforming Qwen3.5-0.8B (11 points) and LFM2.5-1.2B-Thinking (8 points). The model supports deployment formats such as MLX 4-bit and GGUF Q4_K_M; after INT4 quantization, its weights are only 0.5GB, and it natively supports 131,072-token long-context processing and hybrid dual-mode inference based on enable_thinking. Leveraging its minimal hardware overhead, OpenBMB has also open-sourced MiniCPM Desk Pet, a standalone desktop floating companion app that runs entirely offline and enables real-time responses to coding activities in tools like Cursor, along with LoRA persona switching.
(Source: BlockBeats)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.