Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Theoretical Insights into In-context Learning with Unlabeled Data

Yingcong Li · Xiangyu Chang · Muti Kara · Xiaofeng Liu · Amit Roy-Chowdhury · Samet Oymak

[ ] [ Project Page ]
Fri 18 Jul 2:30 p.m. PDT — 3:15 p.m. PDT

Abstract: Recent research shows that in-context learning (ICL) can be effective even in settings where demonstrations have missing or incorrect labels. This motivates a deeper understanding of how sequence models leverage unlabeled data. We consider a canonical setting where the in-context demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention learns the optimal fully-supervised learner but it completely fails to leverage the unlabeled data. (2) Multilayer as well as looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^i X^\top y$ with $X$ and $y$ denoting features and visible labels. We shed light on the class of polynomials that can be expressed as a function of depth/looping and draw connections to iterative pseudo-labeling.

Chat is not available.