Skip to yearly menu bar Skip to main content


Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin · Adam Polyak · Uriel Singer · Yuval Kirstain · Amit Zohar · Oron Ashual · Devi Parikh · Yaniv Taigman

Arch 4A-E Poster #395
award Highlight
[ ] [ Project Page ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present IEdit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop IEdit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks, all of which are formulated as generative tasks. Additionally, to enhance IEdit's multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for IEdit's outstanding performance. Furthermore, we show that IEdit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks.

Live content is unavailable. Log in and register to view live content