Poster
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang · Yixin Chen · Zan Wang · Jie Yang · Yizhou Wang · Siyuan Huang
Recent advances in 3D human-centric generation have made significant progress. However, existing methods still struggle with generating novel Human-Object Interactions (HOIs), particularly for open-set objects. We identify three main challenges of this task: precise human object relation reasoning, adaptive affordance parsing for unseen objects, and realistic human pose synthesis that aligns with the description and 3D object geometry. In this work, we propose a novel zero-shot 3D HOI generation framework, leveraging the knowledge from large-scale pretrained models without training from specific datasets. More specifically, we first generate an initial human pose by sampling multiple hypotheses through multi-view SDS based on the input text and object geometry. We then utilize a pre-trained 2D image diffusion model to parse unseen objects and extract contact points, avoiding the limitations imposed by existing 3D asset knowledge. Finally, we introduce a detailed optimization to generate fine-grained, precise and natural interaction, enforcing realistic 3D contact between the involved body parts, including hands in grasp, and 3D object. This is achieved by distilling relational feedback from LLMs to capture detailed human-object relations from the text inputs. Extensive experiments validate the effectiveness of our approach compared to prior work, particularly in terms of the fine-grained nature of interactions and the ability to handle open-set 3D objects.
Live content is unavailable. Log in and register to view live content