Poster
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian · Jian Zhao · Zechao Hu · Zhengwei Yang · Hao Li · Lei Jin · Zheng Wang · Xuelong Li
Composed Image Retrieval (CIR) is a multi-modal task that seeks to retrieve target images by harmonizing a reference image with a modified instruction. The main challenge in CIR lies in compositional conflicts between the reference image (e.g., blue, long sleeve) and the modified instruction (e.g., grey, short sleeve). Previous works attempt to mitigate such conflicts through feature-level manipulation, commonly employing learnable masks to obscure conflicting features within the reference image. However, the inherent complexity of feature spaces poses significant challenges in precise conflict neutralization, thereby leading to uncontrollable results. To this end, this paper proposes the Compositional Conflict Identification and Neutralization (CCIN) framework, which sequentially identifies and neutralizes compositional conflicts for effective CIR. Specifically, CCIN comprises two core modules: 1) Compositional Conflict Identification module, which utilizes LLM-based analysis to identify specific conflicting attributes, and 2) Compositional Conflict Neutralization module, which first generates a kept instruction to preserve non-conflicting attributes, then neutralizes conflicts under collaborative guidance of both the kept and modified instructions. Furthermore, an orthogonal parameter regularization loss is introduced to emphasize the distinction between target and conflicting features. Extensive experiments demonstrate the superiority of CCIN over the state-of-the-arts.
Live content is unavailable. Log in and register to view live content