Poster Sat, Jun 6, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 386

SpatialDiff: 3D-Aware Object Movement via Implicit Spatial Modeling

Zheng Liu ⋅ Zijian He ⋅ Huiguo He ⋅ Weizhi Zhong ⋅ Yejun Tang ⋅ Huan Yang ⋅ Kun Gai ⋅ Guanbin Li

Abstract

Recent advances in image editing allow impressive manipulation of objects, existing methods still struggle to handle spatial manipulation in complex scenes, such as objects span different depth layers or are partially occluded.Most image editing methods focus solely on 2D datasets prior information, emphasizing planar features while lacking support for spatial positional structures. Even approaches that incorporate explicit positional information fail to capture true 3D spatial relationships, thus limiting accurate object movement in complex scenes.In this paper, we present $\textbf{SpatialDiff}$, a method that effectively captures 3D spatial structures, enabling precise and consistent object movements in complex scenes.Our core innovations are twofold: (1) $\textbf{Implicit 3D Spatial Modeling}$, which introduces 3D prior knowledge and enables the model to internally build a comprehensive understanding of the three-dimensional spatial structure; and (2) $\textbf{Global Spatial Supervision}$, which constrains the latent spatial features to enable the model to perceive changes in object spatial positions caused by editing operations.Experimental results demonstrate that our method significantly improves the accuracy and fidelity of spatial manipulation in complex scenes.