Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025
Plane Geometry Diagram Formalization via Vision-Language Models
Xiaoteng Cui · Yi Liu
Large models such as vision language models (VLMs) have demonstrated robust world knowledge comprehension, inspiring advancements in automated mathematical problem-solving. In the domain of geometry problem-solving, the intricate and diverse abstract relationships inherent in geometry diagrams present significant challenges for leveraging large models. To enhance the accuracy of geometry problem-solving, we analyze existing problem-solving paradigms and propose leveraging VLMs for enhanced diagram autoformalization accuracy. First, we construct a multimodal instruction-tuning dataset named GeometryDiagramFormalization86K (GDF86K) through data augmentation based on algebraic commutativity in the Geometry3K dataset. This dataset contains over 86,000 image-caption pairs to facilitate training of diagram autoformalization models. Utilizing GDF86K, we conduct supervised fine-tuning to implement Geo-TinyLLaVA, a vision-language model specialized in geometry diagram autoformalization. When input diagrams with complete point annotations, Geo-TinyLLaVA outperforms the conventional Inter-GPS diagram parser in autoformalization performance and can serve as a plugin to enhance the problem-solving accuracy of the geometry problem-solving system.