Skip to yearly menu bar Skip to main content


HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment

Juze Zhang · Jingyan Zhang · Zining Song · Zhanhe Shi · Chengfeng Zhao · Ye Shi · Jingyi Yu · Lan Xu · Jingya Wang

Arch 4A-E Poster #33
award Highlight
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT

Abstract: Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects, due to fundamental data scarcity. In this paper, we introduce HOI-M$^3$, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably, it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M$^3$ dataset, we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M$^3$ dataset, corresponding codes, and pre-trained models will be disseminated to the community for future research.

Live content is unavailable. Log in and register to view live content