Tsinghua University and Mianbi open-source the world's first AI-coding pre-training framework, ForgeTrain

ME AI News: According to monitoring by Beating, Mianbi Intelligence and Tsinghua University’s NLP Lab have jointly open-sourced ForgeTrain, the world’s first production-grade large model pretraining framework entirely written by AI, along with MiniCPM5-1B, a small on-device model trained using ForgeTrain. As the first example demonstrating an engineering闭环 of “AI creating AI,” ForgeTrain outperforms NVIDIA’s Megatron under identical hardware conditions and achieves a 10% speedup during pretraining on Huawei Ascend. Meanwhile, MiniCPM5-1B has ranked #1 on the Artificial Analysis leaderboard for open-weight small models. To enable AI to autonomously build foundational pretraining infrastructure, Mianbi Intelligence introduced the “Forge Engineering” software programming paradigm, abandoning generic frameworks designed to be compatible with all hardware and tasks. Instead, it leverages AI’s low-cost code generation capabilities to forge specialized code tailored to specific models and hardware. Architecturally, ForgeTrain employs a three-phase approach: first, it collects key data from existing pretraining frameworks to form a test harness; second, it iteratively generates binary-equivalent framework code within an automated feedback loop; finally, it removes constraints to surpass the reference implementation. This entire automated evolution corresponds to the L3 to L4 stages of “AI creating AI.” As the first model produced by ForgeTrain, MiniCPM5-1B features 1.08 billion parameters and is built on the standard LlamaForCausalLM architecture, significantly lowering the barrier for downstream integration and inference deployment. In Artificial Analysis evaluations, the model scored 18 points, surpassing the 2B-scale Qwen3.5-2B (16 points), and outperforming Qwen3.5-0.8B (11 points) and LFM2.5-1.2B-Thinking (8 points). The model supports deployment formats such as MLX 4-bit and GGUF Q4_K_M; after INT4 quantization, its weights are only 0.5GB, and it natively supports 131,072-token long-context processing and hybrid dual-mode inference based on enable_thinking. Leveraging its minimal hardware overhead, OpenBMB has also open-sourced MiniCPM Desk Pet, a standalone desktop floating companion app that runs entirely offline and enables real-time responses to coding activities in tools like Cursor, along with LoRA persona switching. (Source: BlockBeats)