Skip to yearly menu bar Skip to main content


CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing

Guiwei Zhang · Tianyu Zhang · Guanglin Niu · Zichang Tan · Yalong Bai · Qing Yang

Arch 4A-E Poster #415
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Text-driven video editing poses significant challenges in exhibiting flicker-free visual continuity while preserving the inherent motion patterns of original videos. Existing methods operate under a paradigm where motion and appearance are intricately intertwined. This coupling leads to the network either overfitting appearance content – failing to capture motion patterns – or focusing on motion patterns at the expense of content generalization to diverse textual scenarios. Inspired by the pivotal role of wavelet transform in dissecting video sequences, we propose CAusal Motion Enhancement tailored for Lifting text-driven video editing (CAMEL), a novel technique with two core designs. First, we introduce motion prompts, designed to summarize motion concepts from video templates through direct optimization. The optimized prompts are purposefully integrated into latent representations of diffusion models to enhance the motion fidelity of generated results. Second, to enhance motion coherence and extend the generalization of appearance content to creative textual prompts, we propose the causal motion-enhanced attention mechanism. This mechanism is implemented in tandem with a novel causal motion filter, synergistically enhancing the motion coherence of disentangled high-frequency components, and concurrently preserving the generalization of appearance content across various textual scenarios. Extensive experimental results show the superior performance of CAMEL.

Live content is unavailable. Log in and register to view live content