InterPrior: A Scalable Motion Prior for Physics-Based Human-Object Interactions
Abstract
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPrior, a scalable framework that learns a unified control policy, i.e., interaction motion prior through large-scale imitation pretraining and post-training by reinforcement learning. InterPrior first distills a full-reference imitation expert into a versatile, goal-conditioned variational policy that reconstructs motion from multi-modal and partially specified goal cues. A targeted diversity process, combining data augmentation and physical perturbations, broadens exposure to varied contact and object conditions, producing a motion prior that generalizes beyond the training data. To address the vast configuration space of large-scale human-object interaction, a reinforcement learning finetuning enhances unseen goal competence, enabling recovery from unsuccessful grasp. The resulting policy acts as a reusable motion prior that can absorb new behaviors, including interactions with unseen objects. We also show its effectiveness in user-interactive control and across different embodiments.