Poster
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Zhenglin Huang · Jinwei Hu · Yiwei He · Xiangtai Li · Xiaowei Huang · Bei Peng · Xingyu Zhao · Baoyuan Wu · Guangliang Cheng
Abstract:
The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the $\textbf{S}$ocial media $\textbf{I}$mage $\textbf{D}$etection data$\textbf{Set}$ (SID-Set), which offers three key advantages: (1) $\textbf{extensive volume}$, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) $\textbf{broad diversity}$, encompassing fully synthetic and tampered images across various classes, and (3) $\textbf{elevated realism}$, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA ($\textbf{S}$ocial media $\textbf{I}$mage $\textbf{D}$etection, localization, and explanation $\textbf{A}$ssistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.
Chat is not available.
Successful Page Load