Poster
in
Workshop: 2nd Generative AI for Biology Workshop
Biological Reasoning with Reinforcement Learning through Natural Language Enables Generalizable Zero-Shot Cell Type Annotations
Xi Wang · Runzi Tan · Simona Cristea
Keywords: [ scRNAseq ] [ benchmarking ] [ DeepSeek-R1 ] [ cell type annotation ] [ zero-shot ] [ LLM ]
Single-cell RNA-sequencing (scRNA-seq) has revolutionized biomedical research by enabling detailed characterization of cell populations, but cell type annotation remains labor-intensive and limited by expert knowledge and specialized algorithms. Here, we demonstrate that DeepSeek-R1, a 671B-parameter large language model (LLM) trained with reinforcement learning, can perform zero-shot scRNAseq cell type annotation by using ranked lists of marker genes as prompts. Through extensive and detailed investigations, we assess DeepSeek-R1’s cell type annotation performance at both cluster-level and single-cell level and demonstrate that DeepSeek-R1 achieves comparable or superior accuracy to expert models such as scTab or scGPT, while providing better adaptability, generalization to novel datasets, and interpretable biological rationales for its predictions. These results highlight the potential of generalist LLMs in biological applications, showing evidence of test-time scaling in biology through natural language reasoning.