Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Quantitative Bounds for Length Generalization in Transformers

Zachary Izzo · Eshaan Nichani · Jason Lee

Keywords: [ LLM theory ] [ transformers ] [ length generalization ]


Abstract:

We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. (2024), who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.

Chat is not available.