ICML An Efficient Row-Based Sparse Fine-Tuning with Low Quantization Error

Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

An Efficient Row-Based Sparse Fine-Tuning with Low Quantization Error

Cen-Jhih Li · Aditya Bhaskara

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Fine-tuning is essential for adapting large language models to downstream tasks, but can be costly for users with limited resources. To address this, Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) have been widely adopted for efficient fine-tuning. In this work, we propose a new SpFT framework inspired by neural network pruning: we identify important neurons using structural pruning and fine-tune only the associated weights. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20–50\% while matching the accuracy of state-of-the-art methods like LoRA's variants.

Chat is not available.

Poster in Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

An Efficient Row-Based Sparse Fine-Tuning with Low Quantization Error

Cen-Jhih Li · Aditya Bhaskara

Poster
in
Workshop: ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models