Skip to yearly menu bar Skip to main content


Poster

Unity in Diversity: Video Editing via Gradient-Latent Purification

Junyu Gao · Kunlin Yang · Xuan Yao · Yufan Hu


Abstract:

Recently, text-driven video editing methods that optimize target latent representations have garnered significant attention and demonstrated promising results. However, these methods rely on self-supervised objectives to compute the gradients needed for updating latent representations, which inevitably introduces gradient noise, compromising content generation quality. Additionally, it is challenging to determine the optimal stopping point for the editing process, making it difficult to achieve an optimal solution for the latent representation. To address these issues, we propose a unified gradient-latent purification framework that collects gradient and latent information across different stages to identify effective and concordant update directions. We design a local coordinate system construction method based on feature decomposition, enabling short-term gradients and final-stage latents to be reprojected onto new axes. Then, we employ tailored coefficient regularization terms to effectively aggregate the decomposed information. Additionally, a temporal smoothing axis extension strategy is developed to enhance the temporal coherence of the generated content. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art methods across various editing tasks, delivering superior editing performance. Code is available in the Supplementary Material.

Live content is unavailable. Log in and register to view live content