Skip to yearly menu bar Skip to main content


Poster

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven · Aliaksandr Siarohin · Sergey Tulyakov · Philip H.S. Torr · Fabio Pizzati

ExHall D Poster #176
[ ] [ Project Page ] [ Paper PDF ]
[ Poster
Sun 15 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

We propose DiTFlow, a method for transferring the motion of a reference video to a newly synthesized one, designed specifically for Diffusion Transformers (DiT). We first process the reference video with a pre-trained DiT to analyze cross-frame attention maps and extract a patch-wise motion signal called the Attention Motion Flow (AMF). We guide the latent denoising process in an optimization-based, training-free, manner by optimizing latents with our AMF loss to generate videos reproducing the motion of the reference one. We also apply our optimization strategy to transformer positional embeddings, granting us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow against recently published methods, outperforming all across multiple metrics and human evaluation. Our code will be open source.

Chat is not available.