Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
Distilling Prompts at Test-Time for Multimodal Few-Shot Learning
Akash Gupta · Amos Storkey · Mirella Lapata
In-Context Learning (ICL) has been a well-established paradigm to adapt Large Multimodal Models (LMMs) to novel tasks with minimal supervision. However, the ICL performance of LMMs improves inconsistently with increasing examples due to additional information present in image embeddings which is irrelevant to the dowstream task. To address this, we introduce a meta-learning strategy that distills task-relevant image features into a fixed set of soft prompts, which can be fine-tuned with just a few examples at test time.Further, to facilitate this distillation, we propose an attention-mapper module, integrated in the LLaVA v1.5 architecture and is trained alongside the soft prompts to enable rapid adaptation under low-data conditions. We show that on the VL-ICL Benchmark, our method outperforms ICL and other prompt distillation approaches and boosts few-shot visual question-answering performance of LMMs.