Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

Promoter Sequence Generation using Homology Prompting

Erik Xie · Courtney Shearer · Ruben Weitzman · Pascal Notin · Debora Marks

Keywords: [ regulatory genomics ] [ computational biology ] [ LLM ] [ genomics language model ]


Abstract:

Promoters are critical regulatory elements that control gene expression and harbor disease-associated variants. We present PROSE (PROmoter SEt transformer), a generative model that learns from evolutionary relationships across mammalian species without requiring sequence alignments. PROSE adapts set transformer architecture to process families of homologous promoters, capturing patterns of conservation and variation that define functional regulatory elements. Trained on 13.6 million promoter sequences from 447 mammalian species, PROSE generates human promoters that accurately reproduce characteristic motifs while maintaining appropriate nucleotide distributions and achieving strong Sei regulatory activity scores. Unlike single-sequence baselines that overfit to repetitive patterns, PROSE produces diverse, biologically plausible sequences by leveraging evolutionary context. Our homology-based prompting approach outperforms single sequence models and demonstrates the value of incorporating cross-species information for genomic sequence design.

Chat is not available.