Poster
VideoSPatS: Video Spatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
Juan Luis Gonzalez Bello · Xu Yao · Alex Whelan · Kyle Olszewski · Hyeongwoo Kim · Pablo Garrido
[
Abstract
]
Abstract:
We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we refer to as Video Spatiotemporal Splines (VideoSPatS).Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields and , which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing.Aided by additional prediction blocks, our VideoSPatS also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method allows for better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining a reasonable canonical space for editing.We also present general video modeling results on the DAVIS, and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of and under neural splines can overcome motion and appearance ambiguities, paving the way to more advanced video editing models.
Live content is unavailable. Log in and register to view live content