Poster
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang · DUO PENG · Feng Chen · Yuwei Yang · Yinjie Lei
Image synthesis is a crucial task with broad applications, such as artistic creation and virtual reality. However, challenges in achieving control over generated images have underscored the need for the task of conditional image synthesis. Current methods for conditional image synthesis, nevertheless, remain limited, as they are often task-oriented with a narrow scope, handling a restricted condition with constrained applicability. In this paper, we propose a novel approach that treats conditional image synthesis as the modular combination of fundamental condition units. This perspective allows us to develop a framework for modular conditional generation, significantly enhancing the model's adaptability to diverse conditional generation tasks and greatly expanding its application range. Specifically, we divide conditions into three primary units: text, layout, and drag. To enable effective control over these conditions, we design a dedicated alignment module for each. For the text condition, we introduce a Dense Concept Alignment (DCA) module, which achieves dense visual-text alignment by drawing on diverse textual concepts. For the layout condition, we propose a Dense Geometry Alignment (DGA) module to impose comprehensive geometric constraints that ensure adherence to spatial configuration of the layout condition. For the drag condition, we introduce a Dense Motion Alignment (DMA) module to apply multi-level motion regularization, ensuring that each pixel follows its desired trajectory without visual artifacts. By flexibly inserting and combining these condition modules, our framework enables highly controllable image generation. Extensive experiments demonstrate the superior performance of our framework across a variety of conditions, including textual caption, layout mask (or box), drag manipulation, and their combinations. Our code will be released.
Live content is unavailable. Log in and register to view live content