Skip to yearly menu bar Skip to main content


SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas · Ay├ža Takmaz · Federico Tombari · Robert Sumner · Marc Pollefeys · Francis Engelmann

Arch 4A-E Poster #342
[ ] [ Project Page ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT
Oral presentation: Orals 4B 3D Vision
Thu 20 Jun 1 p.m. PDT — 2:30 p.m. PDT


Existing 3D scene understanding methods are heavily focused on 3D semantic and instance segmentation. However, identifying objects and their parts only constitutes an intermediate step towards a more fine-grained goal, which is effectively interacting with the functional interactive elements (e.g., handles, knobs, buttons) in the scene to accomplish diverse tasks. To this end, we introduce SceneFun3D, a large-scale dataset with more than 14.8k highly accurate interaction annotations for 710 high-resolution real-world 3D indoor scenes. We accompany the annotations with motion parameter information, describing how to interact with these elements, and a diverse set of natural language descriptions of tasks that involve manipulating them in the scene context. To showcase the value of our dataset, we introduce three novel tasks, namely functionality segmentation, task-driven affordance grounding and 3D motion estimation, and adapt existing state-of-the-art methods to tackle them. Our experiments show that solving these tasks in real 3D scenes remains challenging despite recent progress in closed-set and open-set 3D scene understanding methods.

Live content is unavailable. Log in and register to view live content