ICML LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Yuto Kojima · Jiarui Xu · Xueyan Zou · Xiaolong Wang

[ Abstract ] [ Project Page ]

[ OpenReview]

Fri 18 Jul 2:30 p.m. PDT — 3:15 p.m. PDT

Abstract:

We propose LoRA-TTT, a novel test-time training (TTT) method for vision-language models (VLMs) that leverages Low-Rank Adaptation (LoRA), applied exclusively to the image encoder. Unlike prior TTT approaches that rely on computationally intensive text prompt tuning and entropy-based loss, LoRA-TTT updates only LoRA parameters at test time, achieving substantial performance gains with minimal memory and runtime overhead. We also introduce an efficient reconstruction loss tailored for TTT. Experiments on 15 datasets show that LoRA-TTT improves zero-shot top-1 accuracy of CLIP-ViT-B/16 by 5.79\% on OOD and 1.36\% on fine-grained benchmarks, without using external models or caches.

Chat is not available.

Poster in Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

Yuto Kojima · Jiarui Xu · Xueyan Zou · Xiaolong Wang

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)