ICML Poster Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness

Spotlight Poster

Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness

Konstantinos Oikonomidis · Jan Quan · Emanuel Laude · Panagiotis Patrinos

West Exhibition Hall B2-B3 #W-505

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Oral presentation: Oral 3D Optimization
Wed 16 Jul 10 a.m. PDT — 11 a.m. PDT

Abstract: We analyze nonlinearly preconditioned gradient methods for solving smooth minimization problems. We introduce a generalized smoothness property, based on the notion of abstract convexity, that is broader than Lipschitz smoothness and provide sufficient first- and second-order conditions. Notably, our framework encapsulates algorithms associated with the gradient clipping method and brings out novel insights for the class of $(L_0,L_1)$-smooth functions that has received widespread interest recently, thus allowing us to extend beyond already established methods. We investigate the convergence of the proposed method in both the convex and nonconvex setting.

Lay Summary:

Gradient descent (GD) is one of the core methods for training models in modern machine learning. Nevertheless, especially in cases where the cost function is not "smooth" enough gradient descent can become inefficient, requiring very small steps in order to find a solution.Our research looks at a smarter way to apply GD, by reshaping the path taken by the algorithm using what's called nonlinear preconditioning. To do this, we consider a different way of thinking about smoothness that goes beyond the standard definitions used in the optimization literature. This allows us to cover a broader class of problems, including some that have recently attracted attention for being hard to optimize but important in practice. We also show how the proposed framework includes popular techniques like "gradient clipping" and other similar methods, and extends them to new scenarios.

Chat is not available.