Skip to yearly menu bar Skip to main content


DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang · Jia Jia · Shikun Sun · Haozhe Wu · Rong Han · Zhenyu Li · Di Tang · Jiaqing Zhou · Jiebo Luo

Arch 4A-E Poster #300
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present \textbf{DCM}, a new multi-modal 3D dataset, which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community, covering 4 music genres. With this dataset, we uncover that dance camera movement is multifaceted and human-centric, and possesses multiple influencing factors, making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties, we propose \textbf{DanceCamera3D}, a transformer-based diffusion model that incorporates a novel bones attention loss and a condition separation strategy. For evaluation, we devise new metrics measuring camera movement quality, diversity, and dancer fidelity. Utilizing these metrics, we conduct extensive experiments on our DCM dataset, providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model.

Live content is unavailable. Log in and register to view live content