Poster
Adapting to Evolving Adversaries with Regularized Continual Robust Training
Sihui Dai · Christian Cianfarani · Vikash Sehwag · Prateek Mittal · Arjun Bhagoji
East Exhibition Hall A-B #E-2201
Adversarial examples are a phenomenon where neural networks are fooled by small imperceptible perturbations. Many existing techniques for training neural networks to be robust against adversarial examples focus on a specific type of perturbations (i.e. those which lie within a specific bounded distance from the original image). These methods typically include examples of perturbed inputs during training to teach the model the kind of mistakes to avoid. However, the types of imperceptible perturbations to which models are vulnerable is large and over time, we may be able to generate perturbations (i.e. small transformations to the inputs) which were not considered when robustly training the model (researchers have come up with many clever transformations, including changing the color of pixels slightly!). In this work, we propose repeated training of the model against new attacks, using the previous iteration of the model as a starting point. We also add a term during training designed to prevent the model outputs from drifting too far away from each other for different perturbation types. This helps with robustness to new perturbations while not `forgetting' robustness against previous types. Our approach takes steps towards achieving robust models that can be easily adapted to new attacks, which is important for applications in which robustness is critical and model retraining is expensive.