Spotlight Poster
Sharp Generalization for Nonparametric Regression by Over-Parameterized Neural Networks: A Distribution-Free Analysis in Spherical Covariate
Yingzhen Yang
East Exhibition Hall A-B #E-2006
Understanding how well neural networks trained by gradient descent (GD) can generalize to new data is a core challenge in modern machine learning. In this work, we study a specific problem: predicting smooth relationships between inputs and outputs (nonparametric regression) using a two-layer neural network trained by GD. We show that, with early stopping, such a network can match the best-known performance of classical kernel methods — a class of powerful, well-understood algorithms — without relying on strong assumptions about the data distribution.Our results show that even with minimal structural assumptions (only requiring the input data to lie on a sphere), these neural networks achieve the same optimal prediction accuracy as if the data had followed more idealized, structured distributions. This makes our findings more widely applicable.To prove our results, we developed two new techniques: one that tracks how the network evolves during training, and another that carefully measures the complexity of all functions the network could learn. These tools may be useful in understanding broader classes of learning algorithms.