ICML Do Language Models Understand Discrimination? Testing Alignment with Human Legal Reasoning under the ECHR

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Do Language Models Understand Discrimination? Testing Alignment with Human Legal Reasoning under the ECHR

Tatiana Botskina

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We investigate the extent to which large language models (LLMs) are aligned with established legal norms by evaluating their ability to reason about discrimination under the European Convention on Human Rights (ECHR). Although existing work on bias in AI focuses primarily on statistical disparities, our study shifts the emphasis to normative reasoning: testing whether LLMs can interpret, apply and justify legal decisions in line with formal legal standards. We introduce a structured framework grounded in ECHR case law, formalising the legal concept of discrimination into testable scenarios. Our empirical findings reveal that current LLMs frequently fail to replicate key aspects of legal reasoning, such as identifying protected characteristics, applying proportionality, or articulating justifications consistent with judicial logic. These results expose critical gaps in the legal alignment of today’s models and point to the need for domain-specific feedback and normative alignment methods to build trustworthy and fair AI systems for high-stakes applications.

Chat is not available.

Poster in Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Do Language Models Understand Discrimination? Testing Alignment with Human Legal Reasoning under the ECHR

Tatiana Botskina

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)