Skip to yearly menu bar Skip to main content


AnyScene: Customized Image Synthesis with Composited Foreground

Ruidong Chen · Lanjun Wang · Weizhi Nie · Yongdong Zhang · An-An Liu

Arch 4A-E Poster #380
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Recent advancements in text-to-image technology have significantly advanced the field of image customization. Among various applications, the task of customizing diverse scenes for user-specified composited elements holds great application value but has not been extensively explored. Addressing this gap, we propose AnyScene, a specialized framework designed to create varied scenes from composited foreground using textual prompts. AnyScene addresses the primary challenges inherent in existing methods, particularly scene disharmony due to a lack of foreground semantic understanding and distortion of foreground elements. Specifically, we develop a foreground injection module that guides a pre-trained diffusion model to generate cohesive scenes in visual harmony with the provided foreground. To enhance robust generation, we implement a layout control strategy that prevents distortions of foreground elements. Furthermore, an efficient image blending mechanism seamlessly reintegrates foreground details into the generated scenes, producing outputs with overall visual harmony and precise foreground details. In addition, we propose a new benchmark and a series of quantitative metrics to evaluate this proposed image customization task. Extensive experimental results demonstrate the effectiveness of AnyScene, which confirms its potential in various applications.

Live content is unavailable. Log in and register to view live content