Spotlight Poster
Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness
Konstantinos Oikonomidis · Jan Quan · Emanuel Laude · Panagiotis Patrinos
West Exhibition Hall B2-B3 #W-505
Wed 16 Jul 10 a.m. PDT — 11 a.m. PDT
Gradient descent (GD) is one of the core methods for training models in modern machine learning. Nevertheless, especially in cases where the cost function is not "smooth" enough gradient descent can become inefficient, requiring very small steps in order to find a solution.Our research looks at a smarter way to apply GD, by reshaping the path taken by the algorithm using what's called nonlinear preconditioning. To do this, we consider a different way of thinking about smoothness that goes beyond the standard definitions used in the optimization literature. This allows us to cover a broader class of problems, including some that have recently attracted attention for being hard to optimize but important in practice. We also show how the proposed framework includes popular techniques like "gradient clipping" and other similar methods, and extends them to new scenarios.