Skip to yearly menu bar Skip to main content


Poster

Scalable Meta-Learning via Mixed-Mode Differentiation

Iurii Kemaev · Dan Andrei Calian · Luisa Zintgraf · Gregory Farquhar · Hado van Hasselt

East Exhibition Hall A-B #E-3304
[ ] [ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation process itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25\% wall-clock time improvements over standard implementations in modern meta-learning setups.

Lay Summary:

Gradient-based bilevel optimisation is a powerful technique in Deep Learning with many important applications in meta-learning, hyperparameter tuning, and beyond. It is known to be computationally expensive, as it requires differentiating through the optimisation process itself. We introduce a novel practical algorithm called Mixed-Flow Meta-Gradients, which uses mixed-mode differentiation to drastically reduce the computational cost of this process by saving 90% of memory and 25% of compute, whilst fully preserving the original numerical results.

Chat is not available.