PhysHO: Physics-Based Dynamic 3D Gaussian Human and Object from Monocular Video
Abstract
Physically plausible reconstruction of human–object dynamics from a single video remains under-explored in physics-based methods. Most prior approaches omit human-generated internal actuation by assuming motion driven solely by gravity and simple contacts. They also rely on idealized constitutive laws that underfit heterogeneous and anisotropic materials. We introduce PhysHO, which tightly couples SMPL-driven Linear Blend Skinning (LBS) with a Material Point Method (MPM) simulator to address these gaps. Our key insight is to use LBS as an interpretable actuation prior and MPM to propagate those forces through contact under physical constraints. Concretely, we derive targeted actuation with a PD controller guided by LBS trajectories and gate it per particle via a learnable LBS-impact factor so that only particles inside the SMPL volume are directly actuated. We model real materials with residual neural constitutive laws layered on expert elastic–plastic models and conditioned on per particle to capture heterogeneity and anisotropy. We stabilize monocular learning with structure-preserving 3D flow supervision and a progressive and loss-balanced training schedule. Our PhysHO reconstructs observed dynamics with high fidelity, and predicts future motion and simulates outcomes under novel human actions. Experimental results demonstrate robust human-driven interactions beyond gravity-only scenes. Our code will be released upon acceptance.