Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

Terry Jingchen Zhang · Wenyuan Jiang · Rongchuan Liu · Yisong Wang · Ning Wang · Junran Yang · Yinya Huang · Mrinmaya Sachan


Abstract:

Formal theorem proving (FTP) stands at the forefront of LLM reasoning, yet existing datasets are largely limited in scope due to their dependence on manual curation. We identify theoretical computer science (TCS) as a promising domain as many TCS problems can be automatically modeled and verified, thereby enabling scalable formalization with rigorous alignment guarantee.We showcase this approach by two TCS problem modules: the Busy Beaver (BB) challenge and the Mixed Boolean–Arithmetic (MBA) challenge. We evaluate leading reasoning models and dedicated theorem provers on our benchmark, the best-performing DeepSeekProver-v2-671B attains only 57.5\% on BB challenge and no more than 12\% on MBA challenge. These results reveal substantial reasoning gaps beyond conventional static benchmarks. We call on the community to further investigate the potential of TCS problems in FTP domain where our approach enables fully automated synthesis of arbitrarily many problems that are universally easy to verify yet systematically hard to prove.

Chat is not available.