ICML Poster LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Poster

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Zihang Liu · Tianyu Pang · Oleg Balabanov · Chaoqun Yang · Tianjin Huang · Lu Yin · Yaoqing Yang · Shiwei Liu

East Exhibition Hall A-B #E-2604

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call *Principal Weights*. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it becomes highly effective after rank reduction. These insights motivate our method: **L**ow-rank **I**nformed Sparse **F**ine-**T**uning ($\texttt{LIFT}$). $\texttt{LIFT}$ only updates the top 5% *Principal Weights* throughout training and consistently achieves better performance on reasoning tasks than Full FT, while maintaining memory efficiency on par with popular parameter-efficient fine-tuning methods. In addition to strong performance on target domains such as arithmetic reasoning, $\texttt{LIFT}$ also retains up to 20% more source-domain knowledge, compared to Full FT and LoRA. Our code is available at: https://github.com/zihanghliu/LIFT.

Lay Summary:

Modern language-based AI models learn to “reason” by adapting its weights to complex tasks through fine-tuning. As AI models continue to grow in size to billion-parameter level, it becomes crucial to develop a fine-tuning method that has both superior performance and better efficiency. To solve this problem, in this paper we study a method called sparse fine-tuning, which only changes a tiny subset of model weights. An important problem for sparse fine-tuning is to find the critical subset of weights. We find that the critical components for fine-tuning can be characterized by the top eigenspace of the weight matrix. Our study reveals that model weights that have the largest magnitude after performing low-rank approximation are the Principal Weights critical to fine-tuning. We then design a method that only fine-tunes the Principal Weights, and name this method Low-rank Informed Sparse Fine-Tuning (LIFT).From empirical studies, we found that LIFT achieves stronger results on reasoning tasks than dense fine-tuning method, while better preserving the knowledge the model already has. Furthermore, the memory overhead of LIFT is significantly lower than dense fine-tuning, comparable to the best efficient fine-tuning methods.By “lifting the veil” with low-rank approximation and fine-tuning the largest-magnitude weights, LIFT finds the “truth” within model weights that are critical to fine-tuning. This work provides insights for determining the critical components of model weights, and inspires future research on designing more efficient fine-tuning approaches that improve the reasoning ability of large AI models.

Chat is not available.