Poster
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
Shihao Zou · Qingfeng Li · Wei Ji · Jingjing Li · Yongkui Yang · Guoqi Li · Chao Dong
West Exhibition Hall B2-B3 #W-413
Video analysis by computers — such as recognizing human actions or tracking motion — is typically done using powerful AI models called Transformers. However, these models require a lot of energy, which limits their use in devices like drones or wearables. A different kind of AI, called Spiking Neural Networks (SNNs), mimics how the brain works and uses much less energy, but current SNNs don’t work well with video data.Our research introduces SpikeVideoFormer, a new kind of energy-efficient video-processing AI model that combines the strengths of Transformers and SNNs. We designed a special way for this model to "pay attention" to important parts of a video over time using simple brain-like signals, rather than complex math. This method keeps processing fast and efficient, even for long videos.SpikeVideoFormer achieves excellent performance in tasks like video classification, human pose tracking, and understanding video scenes — matching or beating traditional models while using up to 16 times less energy. This could make smart, energy-efficient video AI possible in more real-world settings.