Skip to yearly menu bar Skip to main content


Poster

STDD: Spatio-Temporal Dual Diffusion for Video Generation

Shuaizhen Yao · Xiaoya Zhang · Xin Liu · Mengyi Liu · Zhen Cui


Abstract:

Diffusion probabilistic model is becoming the cornerstone of data generation, especially generating high-quality images. As an extension, video diffusion generation is in urgent need of a principled temporal-sequence diffusion way, while the spatial-domain diffusion dominates most video diffusion methods. In this work, we propose an explicit Spatio-Temporal Dual Diffusion (STDD) method by principledly extending the standard diffusion model to a spatio-temporal diffusion model for joint spatial and temporal noise propagation/reduction. Mathematically, an analysable dual diffusion process is derived to accumulate noises/information in temporal sequence as well as spatial domain. Correspondingly, we theoretically derive a spatio-temporal probabilistic reverse diffusion process and propose an accelerated sampling way to reduce the inference cost. In principle, the spatio-temporal dual diffusion enables the information of previous frames to be transferred to the current frame, which thus could be beneficial for video consistency. Extensive experiments demonstrate that our proposed STDD is more competitive over the state-of-the-art methods in the task of video generation/prediction as well as text-to-video generation.

Live content is unavailable. Log in and register to view live content