Detect Any AI-Counterfeited Text Image
Abstract
The rapid advancement of generative AI enables the creation of highly realistic text images, posing significant security risks from fraud and disinformation. However, research into robust detection is critically hampered by existing datasets that lack scale, diversity, and updated counterfeit techniques, as well as by models that fail to generalize. To address these deficiencies, we introduce DanceText, a large-scale, comprehensive dataset for AI-counterfeited text image detection. Constructed using our novel Creative Proposer pipeline, which automates the generation of diverse and realistic counterfeits, DanceText surpasses previous works by over 100-fold in multiple dimensions. It is the first to include counterfeits from multimodal large models, commercial software, and mobile apps, covering all major paradigms, including full-image generation, regional removal, and editing. Building on this dataset, we propose DS-Net, a novel and effective detection model. It features two key components: a Forensic Decoupling Encoder to extract generator-agnostic artifact features, and a Synergy Denoising Decoder that synergizes image-level classification with instance-level localization. Extensive experiments demonstrate that DS-Net achieves state-of-the-art performance, advancing the field toward robust and unified detection of AI-counterfeited text images in real-world scenarios. Both our code and dataset will be released publicly.