PPM-CLIP: Probabilistic Prompt Modeling for Generalizable AI-Generated Image Detection
Abstract
The rapid rise of highly realistic AI-generated images necessitates reliable and generalizable detection methods. However, existing methods are constrained by their discriminative nature: by learning a single static decision boundary, they tend to memorize generator-specific artifacts and consequently fail to generalize to the unseen distributions of new generative models. To overcome this limitation, we propose PPM-CLIP, a new framework that shifts from static classification to conditional generative modeling based on the CLIP vision–language model. Instead of learning a fixed decision boundary, a Probabilistic Prompt Modeling (PPM) module is used as a generator that produces an adaptive distribution of prompts according to the input image. This allows the model to flexibly capture novel artifacts, rather than matching them against fixed templates. In addition, to enhance the visual encoder's sensitivity to subtle artifacts, a Patch-Wise Contrastive Learning (PWCL) strategy is introduced. Extensive experiments on Ojha, GenImage, and DRCT benchmarks demonstrate that our generative paradigm significantly outperforms state-of-the-art methods, especially in cross-domain detection. Code will be released on GitHub.