Poster
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser · Kristina Ulicna · Gemma Moran · Aditya Ravuri · Kian Kenyon-Dean · Cian Eastwood · Jason Hartford
East Exhibition Hall A-B #E-3309
Sparse dictionary learning (DL) has emerged as a powerful approach to extract semantically meaningful concepts from the internals of large language models (LLMs) trained mainly in the text domain. In this work, we explore whether DL can extract meaningful concepts from less human-interpretable scientific data, such as vision foundation models trained on cell microscopy images, where limited prior knowledge exists about which high-level concepts should arise. We propose a novel combination of a sparse DL algorithm, Iterative Codebook Feature Learning (ICFL), with a PCA whitening pre-processing step derived from control data. Using this combined approach, we successfully retrieve biologically meaningful concepts, such as cell types and genetic perturbations. Moreover, we demonstrate how our method reveals subtle morphological changes arising from human-interpretable interventions, offering a promising new direction for scientific discovery via mechanistic interpretability in bioimaging.
Researchers in machine learning are increasingly interested in understanding how complex models process information internally: a field known as mechanistic interpretability. This area focuses on uncovering how models compute their outputs, rather than evaluating how well those outputs align with human intuition. One promising approach from this field, called sparse dictionary learning, has shown success in analyzing language models by identifying components inside the model that correspond to distinct patterns in language. In this work, we explore whether similar techniques can be used to study models trained not on text, but on scientific data such as microscopy images of cells. These models, known as vision foundation models, are trained to capture rich visual features but are much harder to interpret. We introduce a method that combines a sparse learning algorithm with a data-driven pre-processing step to help identify meaningful biological concepts. This approach enables us to extract meaningful biological patterns, such as differences between cell types and the effects of genetic perturbations. This approach reveals not only interpretable internal features, but also subtle morphological changes in cells, suggesting new avenues for using machine learning and mechanistic interpretability to advance scientific discovery in bioimage data analysis.