ICML Understanding How Chess-Playing Language Models Compute Linear Board Representations

Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Understanding How Chess-Playing Language Models Compute Linear Board Representations

Aaron Mei

Keywords: [ World Models ] [ Mechanistic Interpretability ] [ Chess ] [ Language Models ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

The field of mechanistic interpretability seeks to understand the internal workings of neural networks, particularly language models. While previous research has demonstrated that language models trained on games can develop linear board representations, the mechanisms by which these representations arise are unknown. This work investigates the internal workings of a GPT-2 style transformer trained on chess PGNs, and proposes an algorithm for how the model computes the board state.

Chat is not available.

Poster in Workshop: Methods and Opportunities at Small Scale (MOSS)

Understanding How Chess-Playing Language Models Compute Linear Board Representations

Aaron Mei

Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)