Skip to yearly menu bar Skip to main content


Poster

Multi-party Collaborative Attention Control for Image Customization

Han Yang · Chuanguang Yang · Qiuli Wang · Zhulin An · Weilun Feng · Libo Huang · Yongjun Xu


Abstract:

The rapid development of diffusion models has fueled a growing demand for customized image generation. However, current customization methods face several limitations: 1) typically accept either image or text conditions alone; 2) customization in complex visual scenarios often leads to subject leakage or confusion; 3) image-conditioned outputs tend to suffer from inconsistent backgrounds; and 4) high computational costs. To address these issues, this paper introduces Multi-party Collaborative Attention Control (MCA-Ctrl), a tuning-free approach that enables high-quality image customization under both text and complex visual conditions. Specifically, MCA-Ctrl leverages two key operations within the self-attention layer to coordinate multiple parallel diffusion processes and guide the target image generation. This approach allows MCA-Ctrl to capture the content and appearance of specific subjects while maintaining semantic consistency with the conditional input. Additionally, to mitigate subject leakage and confusion issues common in complex visual scenarios, we introduce a Subject Localization Module that extracts precise subject and editable image layers based on user instructions. Extensive quantitative and human evaluation experiments demonstrate that MCA-Ctrl outperforms previous methods in zero-shot image customization, effectively addressing the aforementioned issues.

Live content is unavailable. Log in and register to view live content