Skip to yearly menu bar Skip to main content


Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Bichen Wu · Ching-Yao Chuang · Xiaoyan Wang · Yichen Jia · Kapil Krishnakumar · Tong Xiao · Feng Liang · Licheng Yu · Peter Vajda

Arch 4A-E Poster #336
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed, by optimizing parallel computing but also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120 frames of high-resolution video (4-second duration at 30 FPS) in just 14 seconds, significantly outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.

Live content is unavailable. Log in and register to view live content