ICML Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks

Yehonatan Avidan · Haim Sompolinsky

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Langevin dynamics—gradient descent with additive stochastic noise—provides a powerful framework for learning dynamics in deep neural networks, bridging deterministic optimization and statistical inference in deep neural networks. It has been shown to unify two prominent theories for wide networks: the Neural Tangent Kernel (NTK), which assumes linearized gradient descent dynamics, and the Bayesian Neural Network Gaussian Process (NNGP), which treats learning as posterior inference. In this work, we extend the framework to compare lazy and non-lazy learning in linear networks, analyzing how different parameters affect the learning dynamics of both the predictor and the kernel in each regime. We show that in the non-lazy case, the network is more resilient to noise and to small initial condition.

Chat is not available.

Poster in Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks

Yehonatan Avidan · Haim Sompolinsky

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)