Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Multi-stream Sequence Learning

Mohamed Elsayed · Rupam Mahmood


Abstract:

We re-evaluate the suitability of the independent and identically distributed (IID) training paradigm for sequence learning, where long data streams are segmented into shorter and shuffled chunks, thereby breaking their natural continuity and undermining long-range credit assignment. This paper offers multi-stream sequence learning, a training framework that presents multiple data streams in their natural order. To support this framework, we propose Memora, a recurrent-only architecture whose persistent hidden states make it more suitable for sequence learning than Transformers. Memora builds on Gated Linear Recurrent Unit (GLRU)---a new lightweight recurrent unit designed for efficient parallel training and robust temporal reasoning---and achieves effective learning on long byte-level sequences. Our experiments on structured and byte-level benchmarks demonstrate that models trained under the multi-stream sequence learning framework consistently outperform standard recurrent and state-space models trained with IID training setting, underscoring the importance of preserving continuity in sequence learning.

Chat is not available.