Poster
Learning Curves of Stochastic Gradient Descent in Kernel Regression
Haihan Zhang · Weicheng Lin · Yuanshi Liu · Cong Fang
West Exhibition Hall B2-B3 #W-619
Modern machine learning often deals with very high-dimensional data—meaning each data point has many features or variables. In such situations stochastic gradient descent (SGD)—a simple yet powerful algorithm—often performs remarkably well. Our work explores a question: how effective is SGD when applied to kernel regression, a classic machine learning method, particularly when both the number of data points and the dimension grow very large? We report a surprising phenomenon: for certain moderately challenging learning problems, stochastic gradient descent (SGD) achieves optimal sample efficiency when the number of data points scales polynomially with the data dimension. Building upon this finding, we further demonstrate that SGD can outperform spectral methods, such as Kernel Ridge Regression (KRR), in simpler problem settings.