Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability

Why Do Metrics Think That? Towards Understanding Large Language Models as Machine Translation Evaluators

Runzhe Zhan · Xinyi Yang · Junchao Wu · Lidia Chao · Derek Wong


Abstract:

Recent advancements in large language models (LLMs) have shown their exceptional ability to assess MT quality, often outperforming traditional metrics. However, the black-box nature of these models poses challenges regarding their transparency and reliability. This paper pioneers the study of mechanism interpretability in LLM-based MT evaluation using open-source models to understand their internal decision-making. By utilizing open-source LLMs at different scales, we investigate their internal decision-making processes to understand performance and explain discrepancies with human judgment. Our analysis reveals that LLMs often exhibit overestimation errors in evaluation. We identify key internal layers influencing these judgments and the crucial role of In-Context Learning (ICL). Based on these insights, we propose two solutions: sparse MQM alignment and error-free ICL demonstration. Experiments on an underperforming Llama-3.1-8B model show these methods effectively mitigate overestimation and improve performance by up to 4 points. Our work offers valuable insights into LLM internal mechanisms for MT evaluation, contributing to the development of trustworthy LLM-based metrics.

Chat is not available.