ICML Poster FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

Poster

FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

Rohan Deb · Kiran Thekumparampil · Kousha Kalantari · Gaurush Hiranandani · Shoham Sabach · Branislav Kveton

East Exhibition Hall A-B #E-1601

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Supervised fine-tuning (SFT) is the most common way of adapting large language models (LLMs) to a new domain. In this paper, we improve the efficiency of SFT by selecting an informative subset of training examples. Specifically, for a fixed budget of training examples, which determines the computational cost of fine-tuning, we select those that maximize information gain, as measured by the Fisher information matrix of the SFT objective. We approximate it efficiently by linearization at the last layer of the LLM. Our approach is computationally efficient, analyzable, and performs well empirically. We demonstrate this on several problems, with both quantitative results and LLM-as-a-judge evaluations.

Lay Summary:

Imagine that you want to teach a large language model (like ChatGPT) to do better on a new task, but you can only give it a limited number of examples to learn from. We carefully choose the ones that teach the model the most. We do this efficiently using tools from statistics, and we show that our approach works well in both theory and practice.

Chat is not available.