ICML Poster FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials

Spotlight Poster

FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials

Seung Lee · Hojoon Kim · Yutack Park · Dawoon Jeong · Seungwu Han · Yeonhong Park · Jae W. Lee

West Exhibition Hall B2-B3 #W-221

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Machine Learning Interatomic Potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with high accuracy. While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75\% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. FlashTP achieves up to 41.6$\times$ and 60.8$\times$ kernel speedups over _e3nn_ and NVIDIA cuEquivariance, respectively. For SevenNet-l3i5, it delivers 4.2$\times$ and 3.5$\times$ speedup while reducing peak memory usage by 6.3$\times$ and 6.2$\times$ for inference and training, respectively. The code is available at https://github.com/SNU-ARC/flashTP.

Lay Summary:

Imagine watching a slow-motion movie of atoms as they jiggle, bump into each other, and form new structures. That’s what molecular dynamics (MD) simulations do on a computer—letting scientists see how materials behave or how proteins fold, without costly lab experimentsRecently, researchers have started using machine-learning interatomic potentials (MLIPs)—deep neural networks trained on high-precision quantum data—to make these simulations both faster and more accurate. However, MLIP-driven simulations are bottlenecked by a mathematical operation called the tensor product, which consumes approximately 75–90% of both computation time and memory.We built FlashTP, an optimized GPU library that fuses all of those slow steps into one, removing redundant data movement and cleverly skipping work that isn’t needed. On modern hardware, FlashTP lets scientists train their models more than 3.5× faster, run simulations 4.2× faster, and use over 6× less memory compared to the popular MLIP framework e3nn. Best of all, it plugs right into the e3nn framework, so you can switch on FlashTP with almost zero code changes and start seeing the speed boost immediately.

Chat is not available.