Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Generative AI for Biology Workshop

No Clear Winner at Small Scale: Comparing Modern Sequence Architectures and Training Strategies for Genomic Language Models

Vera Milovanović · Antonio Orvieto

Keywords: [ Mamba ] [ Attention ] [ genomics ] [ genomic language models ] [ sequence models ]


Abstract:

Pretrained large language models based on a variety of sequence modeling architectures (e.g. Transformers, Mamba, Hyena) are increasingly being applied beyond natural language processing (NLP). In genomics, they have shown potential to reveal intricate structures and dependencies within DNA sequences, particularly within non-coding regions. To guide a principled development of training methods and architectures in the genomics domain, in this work we examine the most common classes of sequence modeling architectures found in language models and further explore transfer-learning paradigms such as pretraining on large-scale external datasets as well as self pretraining (on the same data, using a reconstruction loss). In contrast to recent works, focusing specifically on finetuning large transformers, show that most recent recurrent models (Mamba) and implicit convolution based models (Hyena), that are increasingly used for genomic language models, do not offer an advantage over attention based Transformer models. To enable thorough and controlled comparisons, we adopt a fixed training pipeline and limit our experiments to relatively small-scale model -- an approach that still aligns well with the performance trends observed in recent studies.

Chat is not available.