Poster
in
Workshop: 2nd AI for Math Workshop @ ICML 2025
README: Rapid Equation Discovery with Multimodal Encoders
Gregory Kang Ruey Lau · Yue Kang · Zi-Yu Khoo · Apivich Hemachandra · Ruth Wan Theng Chew · Bryan Kian Hsiang Low
Discovering scientific laws or interpretable symbolic equations from data rapidly is important in many setting, such as decision-making in time-sensitive high-stake scenarios or applications involving interactive or iterative experimentation such as in scientific or machine learning workflows. However, existing methods, generally known as symbolic regression (SR), typically require long computational time to achieve good performance and have to run from scratch for each dataset. Recent methods that use pre-training SR foundation models for faster inference also suffer from performance limitations and require large training datasets. In this work, we propose README, a framework for rapid equation discovery that can generate performant, interpretable equations from limited, noisy data in just a few seconds, and requires significantly less training data compared to past SR foundation model approaches. We achieve this by being the first to (1) work with image representations of datasets to efficiently capture their key properties, (2) combine the capabilities of open-sourced pre-trained text and image encoders to produce an informative SR embedding space, and (3) develop a novel Grey Wolf Optimizer with Bayesian Optimization (GWOBO) algorithm to rapidly optimize for the best symbolic expression within seconds. We empirically show that README outperforms benchmarks on a wide range of realistic datasets, including real experimental data from various domains and noisy video-extracted dynamics.