ICML Poster MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Poster

MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Ruida Wang · Rui Pan · Yuxin Li · Jipeng Zhang · Yizhen Jia · Shizhe Diao · Renjie Pi · Junjie Hu · Tong Zhang

East Exhibition Hall A-B #E-2604

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted the mathematical and computer science communities. State-of-the-art methods utilize a single Large Language Model (LLM) to generate complete proof or perform tree search, but they fail to balance these tasks. We propose MA-LoT: Model-CollAboration Lean-based Long Chain-of-Thought, a comprehensive framework for Lean4 theorem proving to solve this issue. It separates the cognition tasks of general NL for whole-proof generation and error analysis for proof correction using the model-collaboration method. We achieve this by structured interaction of the LLM and Lean4 verifier in Long CoT. To implement the framework, we propose the novel LoT-Transfer Learning training-inference pipeline, which enables the Long CoT thinking capability to LLMs without special data annotation. Extensive experiment shows that our framework achieves a 61.07% accuracy rate on the Lean4 version of the MiniF2F-Test dataset, largely outperforming DeepSeek-V3 (33.61%), single-model tree search (InternLM-Step-Prover, 50.70%), and whole-proof generation (Godel-Prover, 55.33%) baselines. Furthermore, our findings highlight the potential of combining Long CoT with formal verification for a more insightful generation in a broader perspective.

Lay Summary:

How can we make AI models to reason like humans? Not just produce some words based on an educated guess of answers based on training data, but truly ground the reasoning in some concrete foundation. Researchers answer this question by putting math reasoning into verifiable frameworks and using LLMs to solve the problem.While traditional methods either educated guess the entire proof in one shot, often making mistakes and hard to correct, or tediously check each step, which slows everything down. Our work introduces MA-LoT, a new kind of AI framework that learns to do both: plan and double-check the proof like a mathematician. One model sketches the big idea of a proof, and another model revises it based on feedback, just like a student learning from a tutor’s comments.To make this work, we created a new training-inference method called LoT-Transfer Learning. It helps the model to develop deep thinking skills without needing huge amounts of hand-labeled data. In the sense of the result, our framework solved some of the toughest problems from international math contests. We provide a more accurate and efficient effort. This isn’t just about math. It’s a step toward AI systems that can think carefully, learn from mistakes, and help in fields where correctness really matters, like software verification and education.

Chat is not available.