Skip to yearly menu bar Skip to main content


RoHM: Robust Human Motion Reconstruction via Diffusion

Siwei Zhang · Bharat Lal Bhatnagar · Yuanlu Xu · Alexander Winkler · Petr Kadlecek · Siyu Tang · Federica Bogo

Arch 4A-E Poster #182
[ ] [ Project Page ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT
Oral presentation: Orals 4C Action and motion
Thu 20 Jun 1 p.m. PDT — 2:30 p.m. PDT


We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at

Live content is unavailable. Log in and register to view live content