ICML Defending Against Prompt Injection with a Few DefensiveTokens

Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models

Defending Against Prompt Injection with a Few DefensiveTokens

Sizhe Chen · Yizhu Wang · Nicholas Carlini · Chawin Sitawarin · David Wagner

Keywords: [ AI security ] [ Prompt injection defense ] [ LLM-integrated application ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Large language models (LLMs) have unlocked many new possibilities in the software world and beyond. However, applications integrated with LLMs are also known to be vulnerable to a new attack---prompt injection. The best-known defenses fine-tune the LLM to be robust in the presence of attacks, which risks decreasing utility, potentially making the LLM providers wary of this approach. Motivated by this, we propose DefensiveToken, a deployment-friendly defense as a first step to help LLM providers secure LLMs without changing their parameters. Defensive tokens are newly inserted special tokens, whose embeddings are optimized by our method to add security. Our scheme achieves prompt injection robustness comparable to fine-tuning the whole LLM while sacrificing minimal utility. When defensive tokens are not inserted, the LLM remains completely unchanged and thus outputs as high-quality responses as it normally does. Therefore, defensive tokens, if offered by the LLM provider, allow LLM-integrated application developers to decide when and where prompt injection security should be prioritized, and change the existing one-model-fits-all situation.

Chat is not available.

Poster in Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models

Defending Against Prompt Injection with a Few DefensiveTokens

Sizhe Chen · Yizhu Wang · Nicholas Carlini · Chawin Sitawarin · David Wagner

Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models