Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Gradient descent in presence of extreme flatness and steepness

Dravyansh Sharma

Keywords: [ learning rate ] [ Gradient descent ] [ non-smooth optimization ] [ Newton's method ] [ non-convex optimization ]


Abstract:

Typical theoretical analysis of convergence of gradient descent requires assumptions like convexity and smoothness that do not hold in practice. Towards understanding the challenges and potential solutions for learning in the presence of non-convex and non-smooth functions, we study the convergence of gradient descent for a simple sigmoid based function family. The functions in this family simultaneously exhibit extreme flatness and extreme sharpness, making it particularly challenging to choose a step size. We show that both small and large step sizes fail; in fact, convergence is a highly volatile function of initialization and learning rate. We observe similar challenges with a known regularized version of Newton's method. We propose a novel Newton-damped gradient descent that performs well for the non-convex, non-smooth family under study, in the sense that most settings of the learning rate lead to convergence. Our small scale experiments indicate interesting directions for both future empirical and theoretical research.

Chat is not available.