Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

A framework to extract and interpret biological concepts from scRNAseq generative foundation models

Charlotte Claye · Pierre Marschall · Wassila Ouerdane · CĂ©line Hudelot · Julien Duquesne

Keywords: [ foundation models ] [ scRNAseq ] [ post-hoc explainability ] [ concept-based explainability ]


Abstract:

Transcriptomic foundation models recently demonstrated strong performances on downstream tasks but remain poorly understood due to their high complexity. There is thus a growing need for post-hoc interpretability at the intersection of deep learning and biology. Sparse auto-encoders have recently been used to identify millions of meaningful concepts encoded in the latent space of large language models and were successfully applied to protein language models. A main challenge is the interpretation of these concepts, which should both reflect the internal mechanisms of the model and be comprehensible to domain experts. We introduce two novel approaches to interpret latent concepts from single-cell RNAseq models. First, we identify a set of genes that contribute to the concept activation, leveraging counterfactual perturbations of gene expressions. Second, we interpret the set of genes using textual gene descriptions from ontologies. We apply our interpretability framework to the cell embedding space of scGPT \cite{cui2024scgpt}, focusing on immune cells. The methodology shows great promise in bridging the gap between deep learning experts and biology specialists.

Chat is not available.