ICML There's No Free Lunch in Safety in Fine-tuning Large Language Models

Invited Talk
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

There's No Free Lunch in Safety in Fine-tuning Large Language Models

Pin-Yu Chen

[ Abstract ]

Sat 19 Jul 10:30 a.m. PDT — 11 a.m. PDT

Abstract:

A common use case of generative AI technology is fine-tuning large language models (LLMs) with domain-specific datasets to improve their capabilities in certain downstream tasks. However, recent studies have shown that fine-tuning aligned LLMs can result in significant performance degradation in safety, even when the fine-tuning data does not contain malicious intent. This talk first explores the potential safety risks of fine-tuning LLMs, provides cost-effective mitigation strategies, and explains the root causes of the fundamental trade-offs between safety and capability in LLM fine-tuning.

Chat is not available.

Invited Talk in Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

There's No Free Lunch in Safety in Fine-tuning Large Language Models

Pin-Yu Chen

Invited Talk
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)