Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Jasper Dekoninck · Ivo Petrov · Kristian Minchev · Miroslav Marinov · Maria Drencheva · Lyuba Konova · Milen Shumanov · Kaloyan Tsvetkov · Nikolay Drenchev · Lazar Todorov · Kalina Nikolova · Nikolay Georgiev · Vanesa Kalinkova · Margulan Ismoldayev · Mislav Balunovic · Martin Vechev
In recent months, large language models (LLMs) have made significant progress in mathematical proof generation, but further advancement is hindered by the lack of a large-scale, high-quality dataset of human-evaluated proofs. While expensive to create, such a dataset is essential for driving improvements in training and enabling a rigorous analysis of proof generation capabilities. In this work, we present the Open Proof Corpus (OPC), a dataset comprising over 5,000 human-evaluated proofs produced by state-of-the-art LLMs. The OPC was specifically designed for broad applicability and downstream usage in proof generation research and is the first to include a substantial number of correct, LLM-generated solutions to problems from prestigious mathematics competitions such as the USAMO and IMO. Using the OPC, we explore critical questions in automated proof generation: (1) the performance gap between natural language and formal proof generation, (2) the discrepancy between final-answer accuracy and full-proof validity, and (3) the impact of best-of-n selection on proof quality. Finally, to showcase the utility of the OPC, we finetune an 8B-parameter model on the dataset, obtaining a model that performs on par with the best model, Gemini-2.5-Pro, on the task of evaluating proof correctness.