Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

Sparse Autoencoders in Protein Engineering Campaigns: Steering and Model Diffing

Gerard Corominas · Filippo Stocco · Noelia Ferruz

Keywords: [ Sparse autoencoders ] [ Protein language models ] [ Mechanistic interpretability ] [ Enzyme design ]


Abstract:

Protein Language Models (pLM) have proven versatile tools in protein design, but their internal workings remain difficult to interpret. Here, we implement a mechanistic interpretability framework and apply it in two scenarios. First, by training sparse autoencoders (SAEs) on the model activations, we identify and annotate features relevant to enzyme variant activity through a two-stage process involving candidate selection and causal intervention. During sequence generation, we steer the model by clamping or ablating key SAE features, which increases the predicted enzyme activity. Second, we compare pLM checkpoints before and after three rounds of Reinforcement Learning (RL) by examining sequence regions with high divergence of per-token log-likelihood, detecting the residues that most align with higher predicted affinities.

Chat is not available.