Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Tokenization Workshop (TokShop)

Motion-Focused Tokenization for Source-Free Video Domain Adaptation

Tzu Ling Liu · Ian Stavness · Mrigank Rochan

Keywords: [ Video tokenization ] [ Action recognition ] [ Domain adaptation ]

[ ] [ Project Page ]
Fri 18 Jul 1:50 p.m. PDT — 3 p.m. PDT

Abstract: Source-free video unsupervised domain adaptation (SFVUDA) represents a significant challenge in action recognition research. It requires adapting a pretrained model from a labeled source domain to an unlabeled target domain, with the constraint that source data remains inaccessible during adaptation. Despite advances in SFVUDA approaches, their performance remains significantly inferior to that of the supervised approach. We argue that a key reason for this performance bottleneck is the presence of variable static backgrounds in videos, which contribute substantially to domain shifts. To address this, we propose Motion-Focused Tokenization (MFT) for SFVUDA. In MFT, we first tokenize source and target video frames into patch tokens, then suppress the low-motion tokens, which largely belong to the background, while retaining the motion-rich tokens corresponding to actions for domain adaptation. Experiments introducing MFT to the best-performing existing SFVUDA method demonstrate a significant improvement ($\sim$2\%) in its performance across two popular domain adaptation (DA) benchmarks, Daily-DA and UCF-HMDB, covering 15 different DA settings.

Chat is not available.