Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models
Yuto Kojima · Jiarui Xu · Xueyan Zou · Xiaolong Wang
Abstract:
We propose LoRA-TTT, a novel test-time training (TTT) method for vision-language models (VLMs) that leverages Low-Rank Adaptation (LoRA), applied exclusively to the image encoder. Unlike prior TTT approaches that rely on computationally intensive text prompt tuning and entropy-based loss, LoRA-TTT updates only LoRA parameters at test time, achieving substantial performance gains with minimal memory and runtime overhead. We also introduce an efficient reconstruction loss tailored for TTT. Experiments on 15 datasets show that LoRA-TTT improves zero-shot top-1 accuracy of CLIP-ViT-B/16 by 5.79\% on OOD and 1.36\% on fine-grained benchmarks, without using external models or caches.
Chat is not available.