Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Technical AI Governance

Fragile by Design: Formalizing Watermarking Tradeoffs via Paraphrasing

Ali Falahati · Lukasz Golab


Abstract: Verification is a cornerstone of technical AI governance, enabling auditability, attribution, and accountability in AI-generated content. As generative models proliferate, watermarking has emerged as a leading strategy for tracing provenance. However, advanced phrasing methods pose a serious threat: they can erase watermarks without altering meaning. We model watermarking under paraphrasing as an adversarial game and prove a no-go theorem: under idealized conditions, no watermark can be both robust and imperceptible. Even with imperfect paraphrasers, robustness is fragile and easily broken. To navigate this tension, we propose the $\varepsilon$-$\delta$ framework, which quantifies the trade-off between robustness ($\varepsilon$) and semantic distortion ($\delta$). Our findings highlight a key asymmetry: removing a watermark is often easier than embedding one that survives. The $\varepsilon$-$\delta$ framework offers a principled foundation for evaluating watermarking in adversarial, safety-critical settings.

Chat is not available.