Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

BC-Design: A Biochemistry-Aware Framework for Highly Accurate Inverse Protein Folding

Robert Tang · Xinwu Ye · Fang Wu · Daniel Shao · Dong Xu · Mark Gerstein

Keywords: [ Biochemistry Encoding ] [ Structural Bioinformatics ] [ Protein Design ] [ Deep learning ]


Abstract:

Inverse protein folding, which aims to design amino acid sequences for desired protein structures, is fundamental to protein engineering and therapeutic development. While recent deep-learning approaches have made remarkable progress, they typically represent biochemical properties as discrete features associated with individual residues. Here, we present BC-Design, a framework that represents biochemical properties as continuous distributions across protein surfaces and interiors. Through contrastive learning, our model learns to encode essential biochemical information within structure embeddings, enabling sequence prediction using only structural input during inference—maintaining compatibility with real-world applications while leveraging biochemical awareness. BC-Design achieves 88\% sequence recovery versus state-of-the-art methods' 67\% (a 21\% absolute improvement) and reduces perplexity from 2.4 to 1.5 (39.5\% relative improvement) on the CATH 4.2 benchmark. Notably, our model exhibits robust generalization across diverse protein characteristics, performing consistently well on proteins of varying sizes (50-500 residues), structural complexity (measured by contact order), and all major CATH fold classes. Through ablation studies, we demonstrate the complementary contributions of structural and biochemical information to this performance. Overall, BC-Design establishes a new paradigm for integrating multimodal protein information, opening new avenues for computational protein engineering and drug discovery.

Chat is not available.