Skip to yearly menu bar Skip to main content


Poster

Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Jihyun Lee · Weipeng Xu · Alexander Richard · Shih-En Wei · Shunsuke Saito · Shaojie Bai · Te-Li Wang · Minhyuk Sung · Tae-Kyun Kim · Jason Saragih


Abstract:

We present a novel framework for egocentric 3D motion tracking, focusing on learning high-fidelity social motions from head-mounted camera image inputs. Our approach leverages a motion diffusion model based on cascaded body-hand diffusion denoising for accurate whole-body motion tracking. We build our network upon a modified Transformer-based architecture using windowed relative-temporal attention to achieve better generalizability to arbitrary-length motions. Additionally, we propose a novel exemplar-based identity conditioning method to further boost tracking quality when prior information (i.e., anchor poses) about the target identity is available. In experiments, our framework achieves state-of-the-art whole-body motion tracking accuracy while enabling real-time inference (> 60 FPS) through diffusion distillation. Our supplementary video demonstrates that our approach can estimate significantly more natural social motions from challenging (e.g., occluded or truncated) egocentric input observations compared to the existing state-of-the-arts.

Live content is unavailable. Log in and register to view live content