Skip to yearly menu bar Skip to main content


Poster

CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models

Junbo Yin · Chao Zha · Wenjia He · Chencheng Xu · Xin Gao

West Exhibition Hall B2-B3 #W-315
[ ] [ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Existing PLMs generate protein sequences based on a single-condition constraint from a specific modality, struggling to simultaneously satisfy multiple constraints across different modalities. In this work, we introduce CFP-GEN, a novel diffusion language model for Combinatorial Functional Protein GENeration. CFP-GEN facilitates the de novo protein design by integrating multimodal conditions with functional, sequence, and structural constraints. Specifically, an Annotation-Guided Feature Modulation (AGFM) module is introduced to dynamically adjust the protein feature distribution based on composable functional annotations, e.g., GO terms, IPR domains and EC numbers. Meanwhile, the ResidueControlled Functional Encoding (RCFE) module captures residue-wise interaction to ensure more precise control. Additionally, off-the-shelf 3D structure encoders can be seamlessly integrated to impose geometric constraints. We demonstrate that CFP-GEN enables high-throughput generation of novel proteins with functionality comparable to natural proteins, while achieving a high success rate in designing multifunctional proteins.

Lay Summary:

We developed a new AI algorithm capable of generating entirely new proteins that do not exist in nature, yet exhibit functional properties comparable to natural proteins.This process is similar to how text-to-image models generate images from a description — but instead of pictures, our model generates proteins based on desired biological functions.By simply specifying what a protein should do, the model designs novel sequences that are ready for use in drug discovery, synthetic biology, and medical applications.This unlocks a new way to explore the protein universe, i.e., not by randomly generating proteins, but by purposefully designing them based on desired functions, making the process far more efficient than previous approaches.

Chat is not available.