Zero-shot Detection of AI-Generated Image via RAW-RGB Alignment
Abstract
Advances in generative AI (GenAI) have increasingly complicated the identification of synthetic images, prompting the proposal of numerous zero-/few-shot detection methods to counter unknown GenAI better. However, we observe that existing detectors often misclassify synthetic images with physical transformations (e.g., print+scan) as real. The essence of this observation lies in: should images remapped from the physical world to digital space still be categorized as ``Synthetic''? Furthermore, the definition of what constitutes real and synthetic images urgently needs to be clarified. We first boldly propose that the authenticity of an image depends on whether it originates from the physical world, i.e., it is necessary to verify the original correlation between the digital image and the physical world. To this end, we first analyze the physical-to-digital mapping process: illumination signals are captured by camera sensors as RAW data, which is then converted into RGB data via camera internal parameters. This process embodies unique physical cues inherent to real scenes. Based on this, we propose a novel forensic feature termed alignment trace, which is constructed by modeling a shared RAW-RGB feature space. This trace captures the inherent parameter correlations of real images in the physical-to-digital conversion process, thereby indirectly verifying the physical origin of the image. Experiments demonstrate that our method achieves state-of-the-art zero-shot detection using only real RAW-RGB data pairs. When additional prior knowledge is provided, the method can be easily fine-tuned to achieve better cross-domain detection performance. We hope this work provides a new baseline for zero-shot synthetic detection and, more significantly, inspires the forensics community to explore the essential distinctions between real and synthetic images.