VMD-FACT: A New Video Dataset and MLLM-based method for Detecting Realistic AI-Generated Video Misinformation
Abstract
The rapid evolution of generative AI, exemplified by models such as Sora, has intensified the threat of video misinformation. A critical challenge in detecting these AI-generated video misinformation lies in a fundamental disconnect between existing datasets and practical deception tactics. Current datasets often disrupt cross-modal consistency through editing techniques, resulting in unrealistic and easily detectable artifacts. In stark contrast, generative video misinformation strives for semantic consistency across modalities to remain realism. To address this gap, we introduce RAVM: the first Realistic AI-Generated Video Misinformation Detection Dataset. RAVM contains authentic claim-video pairs, as well as Realistic AI-Generated claim-video pairs. More importantly, unlike existing Video Misinformation Detection (VMD) datasets that are limited to single-source manipulations, RAVM encompasses multiple manipulation sources—Claim, Video, Audio, and Cross-Modal Manipulation—each of which includes multiple manipulation techniques to generate realistic AI-generated video misinformation. Thus, we introduce an AI-generative framework for producing realistic AI-generated video misinformation. Furthermore, we propose IEEG model, which represents multimodal evidence, fact-checking results, and their dependencies as an evidence graph for interpretable AI-generated VMD. Extensive experiments on RAVM demonstrate the vulnerability of general Multimodal Large Language Models (MLLMs) in detecting generative video misinformation, while our IEEG achieves state-of-the-art performance on RAVM.