ICML A Statistical Physics of Language Model Reasoning

Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models

A Statistical Physics of Language Model Reasoning

Jack Carson

Keywords: [ AI safety ] [ Transformer Interpretability ] [ Stochastic Dynamics ] [ Regime Switching ] [ Chain-of-Thought Reasoning ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Transformer language models exhibit emergent reasoning capabilities that have largely resisted mechanistic understanding. We introduce a statistical physics-inspired framework to describe the continuous-time dynamics underlying chain-of-thought reasoning in large transformer models. Specifically, we analyze sentence-level hidden state trajectories as realizations of a low-dimensional stochastic dynamical system governed by drift-diffusion processes with latent regime switching. Using empirical trajectories extracted from eight open-source transformer models evaluated on seven diverse reasoning benchmarks, we identify a rank-40 drift manifold that explains approximately 50\% of variance in reasoning trajectories, along with four distinct latent reasoning regimes. We then formulate and validate a switching linear dynamical system model capturing these empirical features. This framework allows simulation of transformer reasoning at significantly reduced computational cost, offering theoretical tools to study critical behavioral transitions, failure modes, and adversarially-induced belief shifts in large language models.

Chat is not available.

Poster in Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models

A Statistical Physics of Language Model Reasoning

Jack Carson

Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models