Poster
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang · Haoxuan Li · Chunyuan Zheng · Lei Wang · Guorui Liao · Zhili Gong · Huayi Yang · Li Liu
A key challenge for controllable image editing is the fact that visual attributes with semantic meanings are not always independent of each other, resulting in spurious correlations in model training. However,most existing methods ignore such issue, leading to biased causal representations learning and unintended changes to unrelated features in the edited images.This leads us to present a diffusion-based causal representation learning framework called CIDiffuser that employs structural causal models (SCMs) to capture causal representations of visual attributes to address the spurious correlation.The framework first adopts a semanticencoder to decompose the representation into the target part, which includes visual attributes of interest to the user, and the other" part.We then introduce a direct causal effect learning module to capture the total direct causal effect between the potential outcomes before and after intervening on the visual attributes.In addition, a diffusion-based learning strategy is designed to optimize the representation learning process.Empirical evaluations on two benchmark datasets and one in-house dataset suggest our approach significantly outperforms the state-of-the-art methods, enabling controllable image editing by modifying learned visual representations.
Live content is unavailable. Log in and register to view live content