Spotlight Poster
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh · Xinyou Wang · Daiheng Zhang · Dongyu Xue · Fei YE · Shujian Huang · Zaixiang Zheng · Quanquan Gu
West Exhibition Hall B2-B3 #W-115
Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks.To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models.Project page and code: https://bytedance.github.io/dplm/dplm-2.1.
Proteins are essential molecules of life, and understanding both their 3D structures and amino acid sequences is crucial for things like drug discovery and protein designs. Recent computer models, known as multimodal protein language models (PLMs), learn to generate both by observing protein sequences and protein 3D structures that are converted into small symbolic units called "tokens", a process called "tokenization". However, this process causes substantial loss of structural details, limiting the models' ability to accurately predict protein structures. In this paper, we address this challenge by elucidating effective design methods for the PLM. We propose several methods, including training strategies that help the model capture structural patterns more effectively, architectural designs that are tailored for proteins, and exploring protein data with multiple chains that carry rich structural arrangements and interaction scenarios. Our evaluations show that these designs effectively improve the accuracy of multimodal PLMs on predicting structures, with fewer model parameters and hence less computation overhead than prior baselines.