LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment
Shuaibang Peng ⋅ Juelin Zhu ⋅ Xia Li ⋅ Kun Yang ⋅ Yu Liu ⋅ Maojun Zhang ⋅ Shen Yan
Abstract
We present LoD-Loc v3, a novel method for generalized aerial visual localization in dense urban environments. While prior work LoD-Loc v2 [89] achieves localization through semantic building silhouette alignment with low-detail city models, it suffers from two key limitations: poor cross-scene generalization and frequent failure in dense building scenes. Our method addresses these challenges through two key innovations. First, we develop a new synthetic data generation pipeline that produces $\textbf{InsLoD-Loc}$ - the largest instance segmentation dataset for aerial imagery to date, comprising 100k images with precise instance-level building annotations. This enables trained models to exhibit remarkable zero-shot generalization capability. Second, we reformulate the localization paradigm by shifting from semantic to instance-level silhouette alignment, which significantly reduces pose estimation ambiguity in dense scenes. Extensive experiments demonstrate that LoD-Loc v3 outperforms existing state-of-the-art (SOTA) baselines, achieving superior performance in both cross-scene and dense urban scenarios with a large margin.
Successful Page Load