Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

Modeling Molecular Sequences with Learning-Order Autoregressive Models

Zhe Wang · Jiaxin Shi · Nicolas Heess · Michalis Titsias · Arthur Gretton · Yee-Whye Teh

Keywords: [ Machine Learning ] [ Variational Inference ] [ Autoregressive Models ] [ Molecular Generation ]


Abstract:

Text-based autoregressive models (ARMs) are popular for SMILES (Simplified Molecular Input Line Entry System) string generation due to their simplicity and state-of-the-art performance, but typically use a fixed left-to-right order. Since optimal SMILES ordering is less obvious than for natural text, we developed LO-ARM (Learning-Order ARM) to learn a data-dependent generation order. Evaluated on ChEMBL, LO-ARM learns consistent and meaningful orderings that reveal molecular substructures, and matches or surpasses state-of-the-art models, offering a well-balanced yet competitive model option.

Chat is not available.