PECCVAI : Overcoming the Brittleness of AI Image Watermarking Under Visual Paraphrasing Attacks
Abstract
By 2026, up to 90% of online content may be synthetically generated, raising serious concerns about the spread of AI-generated disinformation. Policymakers and companies alike are responding California’s Bill AB 321 mandates watermarking of AI-generated media, while firms like Meta and Google are deploying watermarking systems to curb the misuse of generative models. Yet, watermarking techniques remain fragile. In this work, we introduce and analyze a novel vulnerability: the visual paraphrase attack, a generative method capable of stripping both visible and invisible watermarks from AI-generated images. The attack operates in two steps: first, a caption is generated for an image. Then, the image and its caption are passed to a diffusion-based text-to-image system, producing a visually similar but watermark free image. Our empirical evaluation demonstrates that visual paraphrasing reliably removes watermarks while preserving the original image’s semantic content, revealing a fundamental weakness in current watermarking systems. To address this, we introduce PECCAVI, the first watermarking method explicitly designed to withstand visual paraphrase attacks. PECCAVI embeds robust, distortion-free watermarks within semantically stable regions of the image, which we term Non-Melting Points (NMPs). The method uses multi-channel frequency domain watermarking and incorporates noisy burnishing to obfuscate watermark locations and resist reverse engineering. PECCAVI is model-agnostic and significantly more durable than existing approaches. We release the first visual paraphrase benchmark dataset and open-source all code and resources1, offering a foundation for future work on robust watermarking in the age of generative AI.