RFDM: Residual Flow Diffusion Models for Video Editing
Mohammadreza Salehi ⋅ Mehdi Noroozi ⋅ Luca Morreale ⋅ Ruchika Chavhan ⋅ Malcolm Chadwick ⋅ Alberto Gil Couto Pimentel Ramos ⋅ Abhinav Mehrotra
Abstract
Autoregressive video generative methods have recently become popular due to their flexibility for variable-length video generation and computational efficiency. However, their deployment in video editing remains relatively unexplored. This paper introduces an efficient causal video editing model that edits a video frame-by-frame. Specifically, we adapt an image-to-image (I2I) model to video-to-video (V2V) where editing at time frame $t$ is conditioned on the model prediction on $t-1$. To make use of the past predictions more effectively, we condition the sampling noise on the past prediction during the diffusion forward process. Our forward process guides the model to explicitly compute the residual between the target and the previous prediction during denoising; we denote this formulation as the Residual-Flow Diffusion Model, RFDM. We initialize RFDM with text-to-image SD1.5 model, and train on the Señorita dataset for global style transfer, local style transfer, and object removal. RFDM achieves competitive results with computationally heavy counterparts while being significantly more efficient. The latency of our method scales linearly with the number of frames, making it the most efficient diffusion-based video editing framework.
Successful Page Load