ICML ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Poster
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen · Faeze Brahman · Jiacheng Liu · Niloofar Mireshghallah · Weijia Shi · Pang Wei Koh · Luke Zettlemoyer · Hannaneh Hajishirzi

[ Abstract ] [ Project Page ]

[ OpenReview]

Sat 19 Jul 8:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Language models (LMs) can memorize and reproduce segments from their pretraining data verbatim even in non-adversarial settings, raising concerns about copyright, plagiarism, privacy, and creativity.We introduce Paraphrase Preference Optimization (ParaPO), a post-training method that fine-tunes LMs to reduce unintentional regurgitation while preserving their overall utility. ParaPO trains LMs to prefer paraphrased versions of memorized segments over the original verbatim content from the pretraining data. To maintain the ability to recall famous quotations when appropriate, we develop a variant of ParaPO that uses system prompts to control regurgitation behavior. In our evaluation on Llama3.1-8B, ParaPO consistently reduces regurgitation across all tested datasets, achieving a 25.4\% reduction on unintentional regurgitation in creative writing, whereas unlearning methods are less effective out of their unlearned domain (with only a 2.3\% reduction). On the instruction-tuned Tulu3-8B model, ParaPO combined with system prompting successfully preserves desirable quotation recall while reducing unintentional regurgitation by 27.5\% in creative writing when instructed not to regurgitate. In contrast, without ParaPO tuning, prompting the model not to regurgitate produces only a marginal reduction.

Chat is not available.

Poster in Workshop: The Impact of Memorization on Trustworthy Foundation Models

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen · Faeze Brahman · Jiacheng Liu · Niloofar Mireshghallah · Weijia Shi · Pang Wei Koh · Luke Zettlemoyer · Hannaneh Hajishirzi

Poster
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models