Skip to yearly menu bar Skip to main content


Poster

Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Leon Götz · Marcel Kollovieh · Stephan Günnemann · Leo Schwinn

East Exhibition Hall A-B #E-2108
[ ] [ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.

Lay Summary:

Transformers are good at dealing with time-based sequences. They can be slow and need a lot of computing power when working with really long sequences. In computer vision (image processing), a technique called token merging has helped to speed up transformers. Token merging combines several similar data chunks (tokens) into one. We extend this idea for the first time to time-based sequences. We also invent a new method called local merging. It only merges tokens that are locally close together, not any tokens in a sequence. This makes the method more efficient for long sequences. It can also be applied in decoder models. We test our method thoroughly and find that it makes models up to 5400% faster without affecting accuracy.

Chat is not available.