Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Impact of Memorization on Trustworthy Foundation Models

ContextLeak: Auditing Leakage in Private In-Context Learning Methods

Jacob Choi · Shuying Cao · Xingjian Dong · Sai Praneeth Reddy Karimireddy

[ ] [ Project Page ]
Sat 19 Jul 8:30 a.m. PDT — 9:30 a.m. PDT

Abstract: In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt.However, when these exemplars contain sensitive information, reliable privacy-preserving mechanisms are essential to prevent unintended leakage through model outputs.Many privacy-preserving methods are proposed to protect the information leakage in the context, but there are less efforts on how to audit those methods.We introduce \texttt{ContextLeak}, the first framework to empirically measure the worst-case information leakage in ICL. \texttt{ContextLeak} uses \emph{canary insertion}, embedding uniquely identifiable tokens in exemplars and crafting targeted queries to detect their presence. We evaluate our method across a range of private ICL techniques, both heuristic such as prompt based defenses and those with theoretical guarantees such as Embedding Space Aggregation and Report Noisy Max. Our results show that \texttt{ContextLeak} tightly correlates with the theoretical privacy budget ($\epsilon$) and reliably detects leakage.They further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.

Chat is not available.