Poster
Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing
Shengzhi Wang · Yingkang Zhong · Jiangchuan Mu · Kai WU · Mingliang Xiong · Wen Fang · Mingqing Liu · Hao Deng · Bin He · Gang Li · Qingwen Liu
Due to control limitations in the denoising process and the lack of training, zero-shot video editing methods often struggle to meet user instructions, resulting in generated videos that are visually unappealing and fail to fully satisfy expectations. To address this problem, we propose Align-A-Video, a video editing pipeline that incorporates human feedback through reward fine-tuning. Our approach consists of two key steps: 1) Deterministic Reward Fine-tuning. To reduce optimization costs for expected noise distributions, we propose a deterministic reward tuning strategy. This method improves tuning stability by increasing sample determinism, allowing the tuning process to be completed in minutes; 2) Feature Propagation Across Frames. We optimize a selected anchor frame and propagate its features to the remaining frames, improving both visual quality and semantic fidelity. This approach avoids temporal consistency degradation from reward optimization. Extensive qualitative and quantitative experiments confirm the effectiveness of using reward fine-tuning in Align-A-Video, significantly improving the overall quality of video generation.
Live content is unavailable. Log in and register to view live content