Skip to yearly menu bar Skip to main content


Poster

Towards Context-Stable and Hue-Consistent Image Inpainting

Yikai Wang · Chenjie Cao · Junqiu Yu · Ke Fan · Xiangyang Xue · Yanwei Fu


Abstract:

Recent advances in image inpainting increasingly use generative models to handle large irregular masks. However, these models can create unrealistic inpainted images due to two main issues: (1) Context Instability: Even with unmasked areas as context, generative models may still generate arbitrary objects in the masked region that don’t align with the rest of the image. (2) Hue Inconsistency: Inpainted regions often have color shifts that causes a smeared appearance, reducing image quality.Retraining the generative model could help solve these issues, but it’s costly since state-of-the-art latent-based diffusion and rectified flow models require a three-stage training process: training a VAE, training a generative U-Net or transformer, and fine-tuning for inpainting.Instead, this paper proposes a post-processing approach, dubbed as ASUKA (Aligned Stable inpainting with UnKnown Areas prior), to improve inpainting models. To address context instability, we leverage a Masked Auto-Encoder (MAE) for reconstruction-based priors. This strengthens context alignment while maintaining the model's generation capabilities. To address hue inconsistency, we propose a specialized VAE decoder that treats latent-to-image decoding as a local harmonization task, significantly reducing color shifts for hue-consistent inpainting. We validate ASUKA on SD 1.5 and FLUX inpainting variants using the Places2 benchmark and MISATO, our proposed diverse collection of datasets. Results show that ASUKA improves context stability and hue consistency over standard diffusion and rectified flow models and other inpainting methods. Code, model, and dataset will be released.

Live content is unavailable. Log in and register to view live content