Skip to yearly menu bar Skip to main content


ZONE: Zero-Shot Instruction-Guided Local Editing

Shanglin Li · Bohan Zeng · Yutang Feng · Sicheng Gao · Xuhui Liu · Jiaming Liu · Li Lin · Xu Tang · Yao Hu · Jianzhuang Liu · Baochang Zhang

Arch 4A-E Poster #141
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT

Abstract: Recent advances in vision-language models like Stable Diffusion have shown remarkable power in creative image synthesis and editing.However, most existing text-to-image editing methods encounter two obstacles: First, the text prompt needs to be carefully crafted to achieve good results, which is not intuitive or user-friendly. Second, they are insensitive to local edits and can irreversibly affect non-edited regions, leaving obvious editing traces. To tackle these problems, we propose a Zero-shot instructiON-guided local image Editing approach, termed $\texttt{ZONE}$. We first convert the editing intent from the user-provided instruction (e.g., ``make his tie blue") into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model. We further develop an edge smoother based on FFT for seamless blending between the layer and the image.Our method allows for arbitrary manipulation of a specific region with a single instruction while preserving the rest. Extensive experiments demonstrate that our $\texttt{ZONE}$ achieves remarkable local editing results and user-friendliness, outperforming state-of-the-art methods. Code is available at

Live content is unavailable. Log in and register to view live content