Fusion of Depth and Semantics for Probabilistic Floorplan Localization
Abstract
Floorplan localization aims to estimate the camera pose of a query image with respect to a 2D floorplan, providing a lightweight and long-term stable alternative to localization based on 3D maps or large image databases for indoor robotics and AR. Recent methods frame the problem as ray-based matching, representing the image as a set of rays annotated with depth or semantic labels and aligning them with the floorplan. However, they still face challenges in addressing the complexity of indoor environments, which can be decomposed into environmental, geometric, and semantic ambiguities.To address these ambiguities, we propose a floorplan-aware probabilistic fusion framework that models both depth and semantic information within a unified architecture. Our framework also combines a distribution-based ray confidence estimator, which down-weights uncertain geometric hypotheses, with a probabilistic semantic matching scheme based on Jensen–Shannon divergence (JSD), which preserves and leverages informative semantic ambiguity instead of collapsing it into hard labels. Experiments on challenging benchmarks demonstrate that our approach significantly outperforms prior methods in both robustness and accuracy.