Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: 2nd Generative AI for Biology Workshop

A Genomic Language Model for Zero-Shot Prediction of Promoter Indel Effects

Courtney Shearer · Felix Teufel · Rose Orenbuch · Christian Steinmetz · Daniel Ritter · Erik Xie · Artem Gazizov · Aviv Spinner · Jonathan Frazer · Mafalda Dias · Pascal Notin · Debora Marks

Keywords: [ machine learning ] [ generative models ] [ genetics ] [ evolutionary sequences ] [ disease ] [ variant effect prediction ]


Abstract:

Disease-associated genetic variants occur extensively across the human genome, predominantly in noncoding regions like promoters. While crucial for understanding disease mechanisms, current methods struggle to predict effects of insertions and deletions (indels) that can disrupt gene expression. We present LOL-EVE (Language Of Life for Evolutionary Variant Effects), a conditional autoregressive transformer trained on 13.6 million mammalian promoter sequences. By leveraging evolutionary patterns and genetic context, LOL-EVE enables zero-shot prediction of indel effects in human promoters. We introduce three new benchmarks for promoter indel prediction: ultra rare variant prioritization, causal eQTL identification, and transcription factor binding site disruption analysis. LOL-EVE's dominate performance across these tasks suggests the potential of region-specific genomic language models for identifying causal non-coding variants in disease studies.

Chat is not available.