Poster
Learning with Noisy Triplet Correspondence for Composed Image Retrieval
Shuxian Li · Changhao He · XitingLiu · Joey Tianyi Zhou · Xi Peng · Peng Hu
[
Abstract
]
Abstract:
Composed Image Retrieval (CIR) enables editable image search by integrating a query pair—a reference image and a textual modification —to retrieve a target image that reflects the intended change. While existing CIR methods have shown promising performance using well-annotated triplets , almost all of them implicitly assume these triplets are accurately associated with each other. In practice, however, this assumption is often violated due to the limited knowledge of annotators, inevitably leading to incorrect textual modifications and resulting in a practical yet less-touched problem: noisy triplet correspondence (NTC). To tackle this challenge, we propose a Task-oriented Modification Enhancement framework (TME) to learn robustly from noisy triplets, which comprises three key modules: Robust Fusion Query (RFQ), Pseudo Text Enhancement (PTE), and Task-Oriented Prompt (TOP). Specifically, to mitigate the adverse impact of noise, RFQ employs a sample selection strategy to divide the training triplets into clean and noisy sets, thus enhancing the reliability of the training data for robust learning. To further leverage the noisy data instead of discarding it, PTE unifies the triplet noise as an adapter mismatch problem, thereby adjusting to align with and in the mismatched triplet. Finally, TOP replaces in the clean set with a trainable prompt, which is then concatenated with to form a query independent of the visual reference, aiming to mitigate visually irrelevant noise. Extensive experiments are conducted on two domain-specific datasets to demonstrate the robustness and superiority of our TME for the CIR task, particularly in noisy scenarios.
Live content is unavailable. Log in and register to view live content