Skip to yearly menu bar Skip to main content


Poster

MoEdit: On Learning Quantity Perception for Multi-object Image Editing

Yanfeng Li · Ka-Hou Chan · Yue Sun · Chan-Tong Lam · Tong Tong · Zitong YU · Keren Fu · Xiaohong Liu · Tao Tan


Abstract:

Multi-object images are widely present in the real world, spanning various areas of daily life. Efficient and accurate editing of these images is crucial for applications such as augmented reality, advertisement design, and medical imaging. Stable Diffusion (SD) has ushered in a new era of high-quality image generation and editing. However, existing methods struggle to analyze abstract relationships between multiple objects, often yielding suboptimal performance. To address this, we propose MoEdit, an auxiliary-free method for multi-object image editing. This method enables high-quality editing of attributes in multi-object images, such as style, object features, and background, by maintaining quantity consistency between the input and output images. To preserve the concept of quantity, we introduce a Quantity Attention (QTTN) module to control the editing process. Additionally, we present a Feature Compensation (FeCom) module to enhance the robustness of object preservation. We also employ a null-text training strategy to retain the guiding capability of text prompts, making them plug-and-play modules. This method leverages the powerful image perception and generation capabilities of the SD model, enabling specific concepts to be preserved and altered between input and output images. Experimental results demonstrate that MoEdit achieves the state-of-the-art performance in high-quality multi-object image editing. Code will be available upon acceptance.

Live content is unavailable. Log in and register to view live content