Poster
Rethink Visual-language Pretraining for Deepfake Detection: Multi-modal Interpretable Forged Face Detection
Xiao Guo · Xiufeng Song · Yue Zhang · Xiaohong Liu · Xiaoming Liu
Deepfake detection is a long-established research topic crucial for combating the spread of malicious misinformation. Unlike previous methods that provide either binary classification results or textual explanations for deepfake detection, we propose a novel method that delivers both simultaneously. Our method harnesses the multi-modal learning power of the pre-trained CLIP and the unprecedented interpretability of large language models (LLMs) to enhance both the generalization and interpretability of deepfake detection. Specifically, we introduce a multi-modal face forgery detector (M2F2-Det) that employs specially designed face forgery prompt learning, integrating zero-shot learning capabilities of the pre-trained CLIP to improve generalization to unseen forgeries.Also, M2F2-Det incorporates the LLM to provide detailed explanations for detection decisions, offering strong interpretability by bridging the gap between natural language and the subtle nuances of facial forgery detection. Empirically, we evaluate M2F2-Det for both detection and sentence generation tasks, on both of which M2F2-Det achieves state-of-the-art performance, showing its effectiveness in detecting and explaining diverse and unseen forgeries. Code and models will be released upon publication.
Live content is unavailable. Log in and register to view live content