ReGenHOI: Unifying Reconstruction and Generation for 3D Human–Object Interaction Understanding
Abstract
Understanding 3D human–object interaction (HOI) involves two highly-related abilities: reconstruction, which perceives observed geometry, and generation, which imagines plausible future interactions. However, most existing methods treat these abilities as separate tasks, limiting their capacity to capture the unified nature of human spatial reasoning. To address this, we propose a unified framework that bridges reconstruction and generation through a shared semantic–geometric reasoning space. Specifically, a 3D Contact Reasoning mechanism enables direct reasoning in 3D space, jointly modeling geometric structure and semantic relationships, while a Reasoning Trace Refinement module iteratively refines contact predictions by integrating geometric and semantic cues. The framework builds a unified latent representation via explicit reasoning on human–object contact regions. To further enhance realism and physical plausibility when generating the outputs of reconstruction and generation, we modify and adapt the Gravity-Field Based Diffusion Bridge to refine fine-grained contact geometry and ensure smooth, physically consistent human–object engagement. Extensive experiments demonstrate that our unified framework significantly improves both reconstruction accuracy and generative interaction quality, establishing a cohesive and interpretable paradigm for 3D HOI understanding.