Spotlight Poster
Rapid Overfitting of Multi-Pass SGD in Stochastic Convex Optimization
Shira Vansover-Hager · Tomer Koren · Roi Livni
West Exhibition Hall B2-B3 #W-810
AI models are often trained using an algorithm called stochastic gradient descent (SGD), which processes data iteratively and updates the model based on each example it sees. In its basic form, SGD makes a single pass over the training data and is known to generalize well — meaning it performs well on new, unseen examples.In practice, however, it's common to make multiple passes over the same data to improve performance. This raises a key question: What are the limits of reusing data in SGD when it comes to generalization?Our research shows that generalization can break down surprisingly quickly — even after just one additional pass. In cases where the one-pass version performs optimally, a second pass can already lead to catastrophic overfitting, where the model memorizes the training data instead of learning patterns that apply more broadly.We analyze this behavior and identify a kind of phase transition after the first pass, where generalization begins to break down. These findings reveal a gap between theory and practice, pointing to the need for new theoretical tools to understand why multi-pass training often appears to succeed in practice, despite these fundamental limitations.