Poster
in
Workshop: Workshop on Technical AI Governance
Fragile by Design: Formalizing Watermarking Tradeoffs via Paraphrasing
Ali Falahati · Lukasz Golab
Abstract:
Verification is a cornerstone of technical AI governance, enabling auditability, attribution, and accountability in AI-generated content. As generative models proliferate, watermarking has emerged as a leading strategy for tracing provenance. However, advanced phrasing methods pose a serious threat: they can erase watermarks without altering meaning. We model watermarking under paraphrasing as an adversarial game and prove a no-go theorem: under idealized conditions, no watermark can be both robust and imperceptible. Even with imperfect paraphrasers, robustness is fragile and easily broken. To navigate this tension, we propose the $\varepsilon$-$\delta$ framework, which quantifies the trade-off between robustness ($\varepsilon$) and semantic distortion ($\delta$). Our findings highlight a key asymmetry: removing a watermark is often easier than embedding one that survives. The $\varepsilon$-$\delta$ framework offers a principled foundation for evaluating watermarking in adversarial, safety-critical settings.
Chat is not available.