Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models

Are Samples Extracted From Large Language Models Memorized?

Chawin Sitawarin · Karan Chadha · John Morris · Saeed Mahloujifar · Chuan Guo

[ ] [ Project Page ]
Sat 19 Jul 8:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Training large language models (LLMs) on diverse datasets, including news, books, and user data, enhances their capabilities but also raises significant privacy and copyright concerns due to their capacity to memorize training data. Current memorization measurements, primarily based on extraction attacks like Discoverable Memorization, focus on an LLM’s ability to reproduce training data verbatim when prompted. While various extensions to these methods exist, allowing for different prompt forms and approximate matching, they introduce numerous parameters whose arbitrary selection significantly impacts reported memorization rates. This paper addresses the critical research question of how to compute the false positive rate (FPR) of these diverse memorization measurements. We propose a practical definition of FPR and ways to interpret them, offering a more principled approach to select an extraction attack and its parameters. Our findings reveal that while "stronger" extraction attacks often identify more memorized samples, they also tend to have higher FPRs. Notably, some computationally intensive methods exhibit lower extraction rates than simpler baselines when controlling for a fixed FPR.

Chat is not available.