According to 1M AI News, the Meituan LongCat team has open-sourced LongCat-Flash-Prover, a 560-billion-parameter MoE model designed for mathematical reasoning tasks in the formal theorem-proving language Lean4. The model weights are released under the MIT license and are now available on GitHub, Hugging Face, and ModelScope.
The model decomposes formal reasoning into three independent capabilities: automatic formalization (converting natural language math problems into Lean4 formal statements), sketch generation (producing lemma-style proof frameworks), and full proof generation. All three capabilities are verified in real time through interaction with the Lean4 compiler via Tool-Integrated Reasoning (TIR).
For training, the team proposes the Hybrid-Experts Iteration Framework to generate cold-start data, introduces the HisPO algorithm during the reinforcement learning phase to stabilize long-term task training for MoE models, and incorporates theorem consistency and validity detection mechanisms to prevent reward hacking.
Benchmark results show that LongCat-Flash-Prover has set new state-of-the-art (SOTA) performance in both automated formalization and theorem proving among open-weight models. It achieves a 97.1% pass rate on MiniF2F-Test with only 72 inferences per problem, and reaches 70.8% and 41.5% on ProverBench and PutnamBench respectively, with no more than 220 inferences per problem.
