Poster
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
Shanshan Huang · Haoxuan Li · Chunyuan Zheng · Mingyuan Ge · WeiGao · Lei Wang · Li Liu
Fashion image editing is a valuable tool for designers to convey their creative ideas by visualizing design concepts.With the recent advances in text editing methods, significant progress has been made in fashion image editing. However, they face two key challenges: spurious correlation in training data often induce changes in other regions when editing a given concept, and these models typically lack the ability to edit multiple concepts simultaneously.To address the above challenges, we propose a novel Text-driven Fashion Image ediTing framework called T-FIT to mitigate the impact of spurious correlation by integrating counterfactual reasoning with compositional concept learning to precisely ensure compositional multi-concept fashion image editing relying solely on text descriptions.Specifically, T-FIT includes three key components: (i) counterfactual abduction module, which learns an exogenous variable of the source image by a denoising U-Net model. (ii) concept learning module, which identifies concepts in fashion image editing—such as clothing types and colors and projects a target concept into the space spanned from a series of textual prompts. (iii) concept composition module, which enables simultaneous adjustments of multiple concepts by aggregating each concept’s direction vector obtained from the concept learning module. Extensive experiments demonstrate that our method can efficiently achieve state-of-the-art performance on various fine-grained fashion image editing tasks, including single-concept editing (e.g., sleeve length, clothing type) as well as multi-concept editing (e.g., color \& sleeve length, fabric \& clothing type).
Live content is unavailable. Log in and register to view live content