Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Quantitative Bounds for Length Generalization in Transformers
Zachary Izzo · Eshaan Nichani · Jason Lee
Keywords: [ LLM theory ] [ transformers ] [ length generalization ]
We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. (2024), who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.