Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models
A Statistical Physics of Language Model Reasoning
Jack Carson
Keywords: [ AI safety ] [ Transformer Interpretability ] [ Stochastic Dynamics ] [ Regime Switching ] [ Chain-of-Thought Reasoning ]
Transformer language models exhibit emergent reasoning capabilities that have largely resisted mechanistic understanding. We introduce a statistical physics-inspired framework to describe the continuous-time dynamics underlying chain-of-thought reasoning in large transformer models. Specifically, we analyze sentence-level hidden state trajectories as realizations of a low-dimensional stochastic dynamical system governed by drift-diffusion processes with latent regime switching. Using empirical trajectories extracted from eight open-source transformer models evaluated on seven diverse reasoning benchmarks, we identify a rank-40 drift manifold that explains approximately 50\% of variance in reasoning trajectories, along with four distinct latent reasoning regimes. We then formulate and validate a switching linear dynamical system model capturing these empirical features. This framework allows simulation of transformer reasoning at significantly reduced computational cost, offering theoretical tools to study critical behavioral transitions, failure modes, and adversarially-induced belief shifts in large language models.