DeepfakeImpact: A Two-Stage Benchmark with Real-World Impact in Deepfake Detection
Abstract
A fundamental yet overlooked limitation of current deepfake detection benchmark is the lack of evaluation frameworks that align technical accuracy with real-world impact. We argue that technical metrics may fail to capture models' actual capacity to mitigate real-world harm, as they treat all errors as equally significant. To bridge this gap, we introduce DeepfakeImpact, a two-stage benchmark that moves beyond pure technical evaluation toward societally-aware assessment. In Stage I, we establish standardized technical baselines by evaluating 33 SOTA detection baslines across 12 widely used datasets. In Stage II, we propose a novel metric (Social Misjudgment Impact, SMI) that quantifies the potential social harm of misclassified videos, and construct a SMI-critical dataset containing high-risk samples. By integrating SMI-aware performance metrics, we shift the evaluation focus from "how accurate'' to "how socially beneficial'' a detector is. DeepfakeImpact thus provides a more realistic and ethically-grounded foundation for assessing deepfake detectors, urging the community to rethink what truly constitutes progress in this field. All resources will be publicly released at: \url{https://anonymous.4open.science/r/DeepfakeImpact-Stage1-F5EC}.