Coupled Diffusion Sampling for Training-free Multi-view Image Editing
Abstract
Given a collection of multi-view images, we perform consistent multi-view editing with a training-free framework using pre-trained 2D editing models and a generative multi-view model.While 2D editing models can independently edit each image in a set of multi-view images of a 3D scene, they do not maintain consistency across views.Existing approaches typically rely on explicit 3D representations to average out the inconsistencies, but they suffer from a lengthy optimization, instability under sparse view settings, and can produce blurry results.We address the problem from a different lens, where we use the 2D editing model to steer a multi-view generative model in the diffusion sampling process.This is achieved through our novel coupled diffusion sampling process. We concurrently sample two trajectories from both a multi-view image distribution and a 2D edited image distribution, and connect the samples with a coupling term. Effectively, the two models guide each other during sampling, and the resulting sample from the multi-view model remains consistent while satisfying the desired edit.We validate the effectiveness and generality of this framework on three distinct multi-view image editing tasks, and demonstrate its applicability across various model architectures. We further illustrate the effects of coupling on SoTA image and video generation models, highlighting the potential of our method beyond multi-view editing.