FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning
Abstract
Recent work in 3D scene understanding has begun to shift from purely spatial analysis to the more complex challenge of functional scene understanding.However, existing methods often consider functional relationships between object pairs in isolation, failing to capture the scene-wide interdependencies that humans use to resolve ambiguity. We introduce FunFact, a framework for constructing probabilistic open-vocabulary functional 3D scene graphs from posed RGB-D images. FunFact first builds an object- and part-centric 3D map and uses foundation models to propose semantically plausible functional relations. These candidates are converted into factor graph variables and constrained by both LLM-derived common-sense priors and geometric priors. This formulation enables joint probabilistic inference over all functional edges and their uncertainties, yielding substantially better-calibrated confidence scores. To benchmark this setting, we also introduce FunThor, a synthetic dataset based on AI2THOR with part-level geometry and systematically-defined rule-based functional annotations. Experiments on SceneFun3D, FunGraph3D, and FunThor show that FunFact improves node and relation discovery recall and significantly reduces calibration error for ambiguous relations, highlighting the benefits of holistic probabilistic modeling for functional scene understanding. We will release the code and dataset to facilitate future research.