Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025

Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

Yuntian Liu · Tao Zhu · Xiaoyang LIU · Yu Chen · Liu ZhaoXuan · Guo qingfeng · Jiashuo Zhang · Kangjie Bao · Tao Luo


Abstract:

Statement autoformalization, the automated translation of statement from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose \textbf{GTED} (\textit{\underline{G}eneralized \underline{T}ree \underline{E}dit \underline{D}istance}), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. On the miniF2F and ProofNet benchmarks, GTED outperforms all baseline metrics by achieving the highest accuracy and Kappa scores, thus providing the community with a more faithful metric for automated evaluation. Implementation code will be released following peer review, while the experimental results are available for review: \url{https://anonymous.4open.science/r/Formal-Semantic-Evaluation-ED4D/}.

Chat is not available.