ICML Poster Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs

Poster

Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs

Hancheng Min · Rene Vidal

West Exhibition Hall B2-B3 #W-721

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Deep learning-based classifiers are known to be vulnerable to adversarial attacks. Existing methods for defending against such attacks require adding a defense mechanism or modifying the learning procedure (e.g., by adding adversarial examples). This paper shows that for certain data distributions one can learn a provably robust classifier using standard learning methods and without adding a defense mechanism. More specifically, this paper addresses the problem of finding a robust classifier for a binary classification problem in which the data comes from an isotropic mixture of Gaussians with orthonormal cluster centers. First, we characterize the largest $\ell_2$-attack any classifier can defend against while maintaining high accuracy, and show the existence of optimal robust classifiers achieving this maximum $\ell_2$-robustness. Next, we show that given data from the orthonormal Gaussian mixture model, gradient flow on a two-layer network with a polynomial ReLU activation and without adversarial examples provably finds an optimal robust classifier.

Lay Summary:

Standard neural network training paradigm often produce networks that are susceptible to malicious attacks that try to manipulate the network outputs by injecting human imperceptible perturbations to the network inputs. We use an ideal mathematical model to explain this vulnerability of neural networks and provide insights into how we can address this issue.

Chat is not available.