Poster
in
Workshop: Actionable Interpretability
Why Do Metrics Think That? Towards Understanding Large Language Models as Machine Translation Evaluators
Runzhe Zhan · Xinyi Yang · Junchao Wu · Lidia Chao · Derek Wong
Recent advancements in large language models (LLMs) have shown their exceptional ability to assess MT quality, often outperforming traditional metrics. However, the black-box nature of these models poses challenges regarding their transparency and reliability. This paper pioneers the study of mechanism interpretability in LLM-based MT evaluation using open-source models to understand their internal decision-making. By utilizing open-source LLMs at different scales, we investigate their internal decision-making processes to understand performance and explain discrepancies with human judgment. Our analysis reveals that LLMs often exhibit overestimation errors in evaluation. We identify key internal layers influencing these judgments and the crucial role of In-Context Learning (ICL). Based on these insights, we propose two solutions: sparse MQM alignment and error-free ICL demonstration. Experiments on an underperforming Llama-3.1-8B model show these methods effectively mitigate overestimation and improve performance by up to 4 points. Our work offers valuable insights into LLM internal mechanisms for MT evaluation, contributing to the development of trustworthy LLM-based metrics.