Poster

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Su Wang ⋅ Chitwan Saharia ⋅ Ceslee Montgomery ⋅ Jordi Pont-Tuset ⋅ Shai Noy ⋅ Stefano Pellegrini ⋅ Yasumasa Onoe ⋅ Sarah Laszlo ⋅ David J. Fleet ⋅ Radu Soricut ⋅ Jason Baldridge ⋅ Mohammad Norouzi ⋅ Peter Anderson ⋅ William Chan

Highlight

2023 Poster

[ Paper PDF] [ Slides] [ Poster]

Abstract

Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to the input text prompt, while consistent with the input image. We present Imagen Editor, a cascaded diffusion model, built by fine-tuning Imagen on text-guided image inpainting. Imagen Editor’s edits are faithful to the text prompts, which is accomplished by incorporating object detectors for proposing inpainting masks during training. In addition, text-guided image inpainting captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Chat is not available.