Skip to yearly menu bar Skip to main content


Poster

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

Huangyue Yu · Baoxiong Jia · Yixin Chen · Yandan Yang · Puhao Li · Rongpeng Su · Jiaxin Li · Qing Li · Wei Liang · Song-Chun Zhu · Tengyu Liu · Siyuan Huang


Abstract:

Embodied AI (EAI) research depends on high-quality and diverse 3D scenes to enable effective skill acquisition, sim-to-real transfer, and domain generalization. Recent 3D scene datasets are still limited in scalability due to their dependence on artist-driven designs and challenges in replicating the diversity of real-world objects. To address these limitations and automate the creation of 3D simulatable scenes, we present METASCENES, a large-scale 3D scene dataset constructed from real-world scans. It features 706 scenes with 15366 objects across a wide array of types, arranged in realistic layouts, with visually accurate appearances and physical plausibility. Leveraging the recent advancements in object-level modeling, we provide each object with a curated set of candidates, ranked through human annotation for optimal replacements based on geometry, texture, and functionality. These annotations enable a novel multi-modal alignment model, SCAN2SIM, which facilitates automated and high-quality asset replacement. We further validate the utility of our dataset with two benchmarks: Micro-Scene Synthesos for small object layout generation and cross-domain vision-language navigation (VLN). Results confirm the potential of METASCENES to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research.

Live content is unavailable. Log in and register to view live content