Skip to yearly menu bar Skip to main content


Poster

Learning with Noisy Triplet Correspondence for Composed Image Retrieval

Shuxian Li · Changhao He · XitingLiu · Joey Tianyi Zhou · Xi Peng · Peng Hu


Abstract: Composed Image Retrieval (CIR) enables editable image search by integrating a query pair—a reference image ref and a textual modification mod—to retrieve a target image tar that reflects the intended change. While existing CIR methods have shown promising performance using well-annotated triplets ref,mod,tar, almost all of them implicitly assume these triplets are accurately associated with each other. In practice, however, this assumption is often violated due to the limited knowledge of annotators, inevitably leading to incorrect textual modifications and resulting in a practical yet less-touched problem: noisy triplet correspondence (NTC). To tackle this challenge, we propose a Task-oriented Modification Enhancement framework (TME) to learn robustly from noisy triplets, which comprises three key modules: Robust Fusion Query (RFQ), Pseudo Text Enhancement (PTE), and Task-Oriented Prompt (TOP). Specifically, to mitigate the adverse impact of noise, RFQ employs a sample selection strategy to divide the training triplets into clean and noisy sets, thus enhancing the reliability of the training data for robust learning. To further leverage the noisy data instead of discarding it, PTE unifies the triplet noise as an adapter mismatch problem, thereby adjusting mod to align with ref and tar in the mismatched triplet. Finally, TOP replaces ref in the clean set with a trainable prompt, which is then concatenated with mod to form a query independent of the visual reference, aiming to mitigate visually irrelevant noise. Extensive experiments are conducted on two domain-specific datasets to demonstrate the robustness and superiority of our TME for the CIR task, particularly in noisy scenarios.

Live content is unavailable. Log in and register to view live content