Poster
On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
Binghui Li · Yuanzhi Li
East Exhibition Hall A-B #E-2104
Adversarial training, similar to standard deep learning, enables deep nets to generalize well to unseen clean data. However, even though adversarial training can reduce training errors, a significant gap in robust generalization remains. We call this the Clean Generalization and Robust Overfitting (CGRO) phenomenon. In this study, we explore CGRO from two perspectives: model complexity and training dynamics. We show that a simple neural network can achieve CGRO through robust memorization, while a fully robust classifier requires much more complex representations. We also analyze the training process of a convolutional network and identify a three-stage phase transition during learning, which leads to robust memorization and explains the CGRO effect. Our theoretical analysis is supported by experiments on real-world image recognition datasets.