Poster
Nonlinear transformers can perform inference-time feature learning
Naoki Nishikawa · Yujin Song · Kazusato Oko · Denny Wu · Taiji Suzuki
West Exhibition Hall B2-B3 #W-909
Modern language models can learn new tasks at test time simply by observing a few examples—a phenomenon known as in-context learning. While it is well known that these models can perform a variety of algorithms in this manner, the mechanism by which gradient-based training gives rise to such test-time adaptability remains mysterious. Our research addresses this gap by examining a class of tasks involving the prediction of outcomes based on an unknown low-dimensional feature. We demonstrate that pretrained transformers can adaptively recover these features during inference, without any need for retraining. These findings provide new theoretical insights into the sample efficiency of transformers at test time, along with provable guarantees that explain how this capability emerges from training.