Poster
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Gül Sena Altıntaş · Devin Kwok · Colin Raffel · David Rolnick
East Exhibition Hall A-B #E-2106
Due to noise, two neural networks trained from the same random starting point can learn one of many different solutions to the same problem, whereas pre-trained networks tend to learn the same solution. What we don’t know is, when and how do networks switch from learning different solutions to the same solution? To answer this question, we train twin copies of neural networks in exactly the same way, but add a tiny change (perturbation) to one of the copies during training. We find that for networks at random starting points, even the tiniest change (far smaller than typical random effects) causes training to learn different solutions, whereas pre-trained networks only learn different solutions when changes much larger than random effects are applied. Our findings are significant because we often need to retrain and combine knowledge from several huge networks (such as large language models). As some methods work better with similar solutions versus different solutions, we can tailor our retraining or model combining methods to best target each case.