Skip to yearly menu bar Skip to main content


Poster

Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing

Shengzhi Wang · Yingkang Zhong · Jiangchuan Mu · Kai WU · Mingliang Xiong · Wen Fang · Mingqing Liu · Hao Deng · Bin He · Gang Li · Qingwen Liu


Abstract:

Due to control limitations in the denoising process and the lack of training, zero-shot video editing methods often struggle to meet user instructions, resulting in generated videos that are visually unappealing and fail to fully satisfy expectations. To address this problem, we propose Align-A-Video, a video editing pipeline that incorporates human feedback through reward fine-tuning. Our approach consists of two key steps: 1) Deterministic Reward Fine-tuning. To reduce optimization costs for expected noise distributions, we propose a deterministic reward tuning strategy. This method improves tuning stability by increasing sample determinism, allowing the tuning process to be completed in minutes; 2) Feature Propagation Across Frames. We optimize a selected anchor frame and propagate its features to the remaining frames, improving both visual quality and semantic fidelity. This approach avoids temporal consistency degradation from reward optimization. Extensive qualitative and quantitative experiments confirm the effectiveness of using reward fine-tuning in Align-A-Video, significantly improving the overall quality of video generation.

Live content is unavailable. Log in and register to view live content