ICML Evaluating Weight Selection for Language Models: A Negative Result and Boundary Conditions

Poster
in
Affinity Workshop: New In ML

Evaluating Weight Selection for Language Models: A Negative Result and Boundary Conditions

Xiantao Zhang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Weight selection from larger pretrained models has demonstrated notable success for initializing smaller Vision Transformers (ViTs), as highlighted by recent research. Motivated by these promising results, our work systematically investigates whether this paradigm can be effectively extended to initialize smaller Large Language Models (LLMs). However, our findings present a significant negative result: across a variety of layer and element selection strategies, these methods consistently fail to outperform, and often underperform, standard random initialization for LLMs on both in-domain and out-of-domain data. This study diagnoses a critical failure mode when transferring this initialization technique to the language domain, thereby delineating important boundary conditions for the weight selection paradigm.

Chat is not available.

Poster in Affinity Workshop: New In ML

Evaluating Weight Selection for Language Models: A Negative Result and Boundary Conditions

Xiantao Zhang

Poster
in
Affinity Workshop: New In ML