Poster
in
Workshop: The 2nd Workshop on Reliable and Responsible Foundation Models
Defending Against Prompt Injection with a Few DefensiveTokens
Sizhe Chen · Yizhu Wang · Nicholas Carlini · Chawin Sitawarin · David Wagner
Keywords: [ AI security ] [ Prompt injection defense ] [ LLM-integrated application ]
Large language models (LLMs) have unlocked many new possibilities in the software world and beyond. However, applications integrated with LLMs are also known to be vulnerable to a new attack---prompt injection. The best-known defenses fine-tune the LLM to be robust in the presence of attacks, which risks decreasing utility, potentially making the LLM providers wary of this approach. Motivated by this, we propose DefensiveToken, a deployment-friendly defense as a first step to help LLM providers secure LLMs without changing their parameters. Defensive tokens are newly inserted special tokens, whose embeddings are optimized by our method to add security. Our scheme achieves prompt injection robustness comparable to fine-tuning the whole LLM while sacrificing minimal utility. When defensive tokens are not inserted, the LLM remains completely unchanged and thus outputs as high-quality responses as it normally does. Therefore, defensive tokens, if offered by the LLM provider, allow LLM-integrated application developers to decide when and where prompt injection security should be prioritized, and change the existing one-model-fits-all situation.