ICML Poster Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators

Poster

Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators

Shanda Li · Shinjae Yoo · Yiming Yang

East Exhibition Hall A-B #E-1607

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Fourier Neural Operators (FNOs) offer a principled approach for solving complex partial differential equations (PDEs). However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $\mu$**Transfer-FNO**, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs _without_ additional tuning. Building on the Maximum Update Parametrization ($\mu$P) framework, we mathematically derive a parametrization scheme that facilitates the transfer of optimal hyperparameters across models with different numbers of Fourier modes in FNOs, which is validated through extensive experiments on various PDEs. Our empirical study shows that $\mu$Transfer-FNO reduces computational cost for tuning hyperparameters on large FNOs while maintaining or improving accuracy.

Lay Summary:

Scientists use special computer programs called neural networks to solve complex physics equations that describe things like fluid flow, heat transfer, and wave propagation. To handle more complex problems, these programs need to become much larger, but finding the right training settings becomes prohibitively expensive - sometimes requiring months of computer time. Our research solves this with a mathematical discovery: by adjusting certain numbers in a specific way when scaling up the program, the optimal training settings remain constant. This means we can find the best settings using a small, inexpensive version of the program, then apply those same settings to train a much larger version - like using a recipe for a small cake to successfully bake a wedding cake. Testing our approach on fluid dynamics problems, we successfully train programs with nearly one billion parameters while using only 30% of the traditional computational cost, making powerful physics simulators accessible to researchers without breaking computing budgets.

Chat is not available.