Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

From SGD to Spectra: A Theory of Neural Network Weight Dynamics

Brian Olsen · Sam Fatehmanesh · Frank Xiao · Adarsh Kumarappan · Anirudh Gajula

Keywords: [ Dyson Brownian motion ] [ SGD dynamics ] [ stochastic differential equations ] [ heavy‑tailed distributions ] [ singular‑value spectra ]


Abstract:

Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear—we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the heavy-tailed "bulk+tail" spectral structure observed empirically in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.

Chat is not available.