Poster
In-Context Deep Learning via Transformer Models
Weimin Wu · Maojiang Su · Jerry Yao-Chieh Hu · Zhao Song · Han Liu
East Exhibition Hall A-B #E-3202
Training deep neural networks from scratch is expensive and time-consuming. We asked: can a powerful pretrained transformer model simulate the training of another deep model β without updating its own parameters? This question matters because it could make machine learning far more efficient and accessible.We show that transformers can perform such βin-context learning,β effectively simulating multiple steps of gradient descent by just observing example data. We construct a transformer architecture that replicates the training process of deep neural networks and provide theoretical guarantees on its accuracy and convergence. We also validate our results through experiments.Our results suggest a path where one foundation model can enable the training of many others, reducing redundancy and computational cost.