VDFE: Difference-Aware 3D Scene Editing with Non-Intrusive Video Diffusion Priors for Multi-View Consistency and Efficiency
Abstract
Text-driven 3D editing, enabled by advancements in 3D reconstruction techniques such as NeRF and 3D Gaussian Splatting, aims to provide intuitive scene customization. However, existing methods frequently exhibit limitations in controllability and consistency. To address these shortcomings, we propose \textbf{VDFE}, a difference-aware 3D scene editing method based on non-intrusive utilization of pre-trained video diffusion priors, which integrates Optimal Control Guided Flow Editing (FlowOCE), Decoupled Flow Difference (DFD), and Difference-Aware Gaussians Editing (DAGE). Specifically, FlowOCE treats the editing process as an optimal control problem, optimizing a noise-free editing trajectory to minimize unintended modifications in non-target region; DFD precisely locates editing region by analyzing flow differences, which supplies priors for the subsequent optimization process; and DAGE leverages precise localization to selectively update 3D Gaussians for efficient and precise refinement. Extensive experiments demonstrate that our method significantly outperforms existing methods in both qualitative and quantitative evaluations, achieving state-of-the-art (SOTA) performance.