Poster
InterMimic: Towards Learning Universal Human-Object Interaction Skills from Imperfect Motion Capture
Sirui Xu · Hung Yu Ling · Yu-Xiong Wang · Liangyan Gui
Achieving realistic simulations of humans engaging in a wide range of object interactions has long been a fundamental goal in animation. Extending physics-based motion imitation techniques to complex human-object interactions (HOIs) is particularly challenging due to the intricate coupling between human-object dynamics and the variability in object geometries and properties. Moreover, motion capture data often contain artifacts such as inaccurate contacts and insufficient hand details, which hinder the learning process. We introduce InterMimic, a framework that overcomes these challenges by enabling a single policy to robustly learn from imperfect motion capture sequences encompassing tens of hours of diverse full-body interaction skills with dynamic and varied objects. Our key insight is employing a curriculum strategy: perfecting first, then scaling up. We first train subject-specific teacher policies to mimic, retarget, and refine the motion capture data, effectively correcting imperfections. Then, we distill a student policy from these teachers; the teachers act as online experts providing direct supervision and supplying clean references. This ensures that the student policy learns from high-quality guidance despite imperfections in the original dataset. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across various HOI datasets. Notably, the learned policy exhibits zero-shot generalization, allowing seamless integration with kinematic generators and transforming the entire framework from mere imitation to generative modeling tasks.
Live content is unavailable. Log in and register to view live content