Skip to yearly menu bar Skip to main content


RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

xiang deng · Zerong Zheng · Yuxiang Zhang · Jingxiang Sun · Chao Xu · Xiaodong Yang · Lizhen Wang · Yebin Liu

Arch 4A-E Poster #177
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


This paper focuses on advancing the applicability of human avatar learning methods by proposing RAM-Avatar, which learns a Real-time, photo-realistic Avatar that supports full-body control from Monocular videos. To achieve this goal, RAM-Avatar leverages two statistical templates responsible for modeling the facial expression and hand gesture variations, while a sparsely computed dual attention module is introduced upon another body template to facilitate high-fidelity texture rendering for the torsos and limbs. Building on this foundation, we deploy a lightweight yet powerful StyleUnet along with a temporal-aware discriminator to achieve real-time realistic rendering. To enable robust animation for out-of-distribution poses, we propose a Motion Distribution Align module to compensate for the discrepancies between the training and testing motion distribution. Results and extensive experiments conducted in various experimental settings demonstrate the superiority of our proposed method, and a real-time live system is proposed to further push research into applications. The training and testing code will be released for research purposes.

Live content is unavailable. Log in and register to view live content