Poster
in
Workshop: 2nd Generative AI for Biology Workshop
Intrinsic Evaluation of DNA Embeddings in Genome Language Models: Insights from Yeast Genomic Sequences
Ruhaib Muhammad · Rajeeva Madhan · Roshan Balaji · Nirav Bhatt
Keywords: [ Genomic Language Models ] [ Interpretability ]
In this work, we present a task-independent evaluation of Genome Language Model (gLM) embeddings to understand what contextual and biological information they inherently capture. Through three novel experiments, we assess how well embeddings reflect sequence similarity, encode evolutionary context, and respond to synthetic point mutations using Yeast genomic sequences. Our findings reveal that embeddings correlate with sequence similarity, cluster by phylogenetic clade, and show differential robustness between coding and non-coding regions. These results offer new insights into the representational capabilities of gLMs and pave the way for principled interpretability and benchmarking of gLMs.