Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025

FMC: Formalization of Natural Language Mathematical Competition Problems

Jiaxuan Xie · Chengwu Liu · Ye Yuan · Siqi Li · Zhiping Xiao · Ming Zhang


Abstract: Efficient and accurate autoformalization methods, leveraging large-scale databases of natural language mathematical problems to construct formal language datasets, are key to advancing formal mathematical reasoning. In this paper, we propose an autoformalization pipeline based on large language models with error feedback, achieving a fully automatic and training-free formalization approach. Using this pipeline, we establish an Olympiad-level dataset aligning natural language problems with Lean formalizations. The dataset contains $3,922$ mathematical problems in natural language and $9787$ in Lean, of which $64.46$% received at least good quality ratings and above, making it suitable as a benchmark for automated theorem provers. Additionally, we investigate the formalization and reasoning capabilities of various general large language models and experimentally demonstrate that few-shot learning, error feedback, and increasing sampling numbers positively impact the autoformalization performance.

Chat is not available.