Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

Teddy: A FAMILY OF FOUNDATION MODELS FOR UNDERSTANDING SINGLE CELL BIOLOGY

Alexis Chevalier · Soumya Ghosh · Urvi Awasthi · James Watkins · Julia Bieniewska · Nichita Mitrea · Olga Kotova · Kirill Shkura · Andrew Noble · Michael Steinbaugh · Vijay Sadashivaiah · George Dasoulas · Julien Delile · Christoph Meier · Leonid Zhukov · Iya Khalil · Srayanta Mukherjee · Judith Mueller

Keywords: [ foundation models ] [ single cell RNA seq ]


Abstract:

Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and donors not seen during training, and probing the learned representations for known biology. Our models showed substantial improvement over existing works, and scaling experiments showed that performance improved predictably with both data volume and parameter count.

Chat is not available.