Skip to yearly menu bar Skip to main content


Poster

Hear What You See: Video-to-Audio Generation with Diffusion Transformer and Semantic-Temporal Alignment-Ranked Direct Preference Optimization

Kai Wang ⋅ Tao Zhou ⋅ jiayi lei ⋅ Jing Wang ⋅ Jinman Zhao ⋅ Weiguo Pian ⋅ Yuan Cheng ⋅ Yapeng Tian ⋅ Peng Gao ⋅ Bin Fu ⋅ Yihao Liu ⋅ Dimitrios Hatzinakos ⋅ Yuewen Cao

Abstract

Log in and register to view live content