Poster
Beyond Entropy: Region Confidence Proxy for Wild Test-Time Adaptation
Zixuan Hu · Yichun Hu · Xiaotong Li · SHIXIANG TANG · LINGYU DUAN
West Exhibition Hall B2-B3 #W-208
Wild Test-Time Adaptation (WTTA) is proposed to adapt a source model to unseen domains under extreme data scarcity and multiple shifts. Previous approaches mainly focused on sample selection strategies, while overlooking the fundamental problem on underlying optimization. Initially, we critically analyze the widely-adopted entropy minimization framework in WTTA and uncover its significant limitations in noisy optimization dynamics that substantially hinder adaptation efficiency. Through our analysis, we identify region confidence as a superior alternative to traditional entropy, however, its direct optimization remains computationally prohibitive for real-time applications. In this paper, we introduce a novel region-integrated method ReCAP that bypasses the lengthy process. Specifically, we propose a probabilistic region modeling scheme that flexibly captures semantic changes in embedding space. Subsequently, we develop a finite-to-infinite asymptotic approximation that transforms the intractable region confidence into a tractable and upper-bounded proxy. These innovations significantly unlock the overlooked potential dynamics in local region in a concise solution. Our extensive experiments demonstrate the consistent superiority of ReCAP over existing methods across various datasets and wild scenarios. The source code will be available at https://github.com/hzcar/ReCAP.
When we train AI models to understand visual inputs, they often struggle when shown images or videos that look very different from what they saw during training — especially if only a few new examples are available. This is a major challenge in the real world, where lighting, backgrounds, or styles can change drastically.Our research focuses on making these models adapt quickly and reliably in such unpredictable situations — a setting we call Wild Test-Time Adaptation (WTTA). While past methods mostly tried to pick better samples to learn from, we looked deeper into how the model learns and found that a commonly used strategy doesn’t work well when the data is noisy or inconsistent.To address this, we propose a new method called ReCAP, which allows the model to grasp local visual patterns more effectively and adjust its internal understanding in real time. It’s faster, more robust, and more accurate than existing techniques. We’ve made the code publicly available to support further research.