ICML DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

Poster
in
Workshop: 2nd Generative AI for Biology Workshop

DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

Max KU · Sun Sun · Hongyu Guo · Wenhu Chen

Keywords: [ multimodal learning ] [ protein editing ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We introduce DisProtEdit, a framework for controllable protein editing that learns disentangled representations of structural and functional properties using dual-channel natural language supervision. Unlike prior approaches that rely on joint holistic embeddings, DisProtEdit explicitly separates semantic factors, enabling modular and interpretable control. To support this, we construct SwissProtDis, a large-scale multimodal dataset where each protein sequence is paired with two textual descriptions, one for structure and one for function, automatically decomposed using a large language model. DisProtEdit aligns protein and text embeddings using alignment and uniformity objectives, while a disentanglement loss promotes independence between structural and functional semantics. At inference time, protein editing is performed by modifying one or both text inputs and decoding from the updated latent representation. Experiments on protein editing and representation learning benchmarks demonstrate that DisProtEdit performs competitively with existing methods while providing improved interpretability and controllability. On a newly constructed multi-attribute editing benchmark, the model achieves a both-hit success rate of up to 61.7\%, highlighting its effectiveness in coordinating simultaneous structural and functional edits.

Chat is not available.

Poster in Workshop: 2nd Generative AI for Biology Workshop

DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

Max KU · Sun Sun · Hongyu Guo · Wenhu Chen

Poster
in
Workshop: 2nd Generative AI for Biology Workshop