ICML Poster Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Poster

Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Tianyi Zhang · Junda Su · Aditya Desai · Oscar Wu · Zhaozhuo Xu · Anshumali Shrivastava

East Exhibition Hall A-B #E-2010

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size. Parameter-efficient fine-tuning (PEFT) techniques typically employ additive adapters applied to frozen model weights. To further reduce memory usage, model weights are often compressed through quantization. However, existing PEFT methods often yield suboptimal model quality because they rely on restrictive assumptions, such as low-rank constraints on adapters to limit the number of trainable parameters. We find that sketching, a popular data compression technique, can serve as an efficient LLM adaptation strategy while avoiding the low-rank assumption. We introduce SketchTune, a compressive adaptation strategy that compresses LLM weights into compact fine-tunable sketches, integrating compression and adaptation into a unified framework. This integration eliminates the need for complex two-path computation in existing PEFT techniques, enabling faster and more memory-efficient training and inference. SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods. Our extensive evaluations with Llama and Mistral models demonstrate that SketchTune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math benchmarks using 2.6-3.5$\times$ smaller base models and exceeds LoftQ in accuracy by 14.48\% on GSM8K with 7.3$\times$ fewer trainable parameters.

Lay Summary:

Large language models (LLMs), such as those used in chatbots and search engines, require significant computer resources because of their enormous size. This creates challenges when adapting these models to new tasks, as updating or fine-tuning them is often slow and memory-intensive.Our work introduces SketchTune, a new technique that first compresses the LLM into a much smaller, "sketched" version. Unlike most compression methods, SketchTune makes this compressed model fully trainable, so it can still be adapted to new tasks. Instead of updating all the original model's parameters, SketchTune allows fine-tuning by modifying only a small set of parameters within the compressed model.We show that models compressed and adapted with SketchTune can achieve similar or even better performance compared to traditional methods, all while using much less memory and computational power. This approach makes it easier and more efficient for a wider range of people and organizations to use and customize powerful LLMs.

Chat is not available.