Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
Theoretical Insights into In-context Learning with Unlabeled Data
Yingcong Li · Xiangyu Chang · Muti Kara · Xiaofeng Liu · Amit Roy-Chowdhury · Samet Oymak
Abstract:
Recent research shows that in-context learning (ICL) can be effective even in settings where demonstrations have missing or incorrect labels. This motivates a deeper understanding of how sequence models leverage unlabeled data. We consider a canonical setting where the in-context demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention learns the optimal fully-supervised learner but it completely fails to leverage the unlabeled data. (2) Multilayer as well as looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^i X^\top y$ with $X$ and $y$ denoting features and visible labels. We shed light on the class of polynomials that can be expressed as a function of depth/looping and draw connections to iterative pseudo-labeling.
Chat is not available.