Skip to yearly menu bar Skip to main content


Poster

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li · Yang Yang · Zichang Tan · Huan Liu · Weihua Chen · Xu Zhou · Zhen Lei

ExHall D Poster #369
[ ] [ Paper PDF ]
Fri 13 Jun 2 p.m. PDT — 4 p.m. PDT

Abstract: To tackle the threat of fake news, the task of detecting and grounding multi-modal media manipulation (DGM$^4$) has received increasing attention. However, most state-of-the-art methods fail to explore the fine-grained consistency within local contents, usually resulting in an inadequate perception of detailed forgery and unreliable results. In this paper, we propose a novel approach named Contextual-Semantic Consistency Learning (CSCL) to enhance the fine-grained perception ability of forgery for DGM$^4$. Two branches for image and text modalities are established, each of which contains two cascaded decoders, i.e., Contextual Consistency Decoder (CCD) and Semantic Consistency Decoder (SCD), to capture within-modality contextual consistency and across-modality semantic consistency, respectively. Both CCD and SCD adhere to the same criteria for capturing fine-grained forgery details. To be specific, each module first constructs consistency features by leveraging additional supervision from the heterogeneous information of each token pair.Then, the forgery-aware reasoning or aggregating is adopted to deeply seek forgery cues based on the consistency features.Extensive experiments on DGM$^4$ datasets prove that CSCL achieves new state-of-the-art performance, especially for the results of grounding manipulated content.Then, the forgery-aware reasoning or aggregating is adopted to deeply capture forgery cues based on the consistency features.Extensive experiments on DGM$^4$ datasets prove that CSCL achieves new state-of-the-art results, especially for the results of grounding manipulated content.

Live content is unavailable. Log in and register to view live content