UniGeoRS: A Unified Benchmark for Tri-view Geo-Localization
Abstract
Cross-view geo-localization (CVGL) aims to estimate an image’s geographic location by matching it with geo-referenced images from different viewpoints, supporting applications such as autonomous driving, UAV navigation, and visual surveillance. However, due to the high cost of image collection, current CVGL datasets often suffer from limited diversity in both drone and ground imagery, which constrains model generalization. Furthermore, existing methods primarily focus on either ground-to-satellite or drone-to-satellite matching, lacking a unified framework capable of handling image matching across all three platforms: satellite, drone, and ground. To this end, we introduce the Unified Geo-localization dataset with Real-world and Synthetic imagery (UniGeoRS), a comprehensive benchmark featuring satellite, drone, and ground-view images, with a particular emphasis on the richness and diversity of drone and ground perspectives, enabling more realistic and flexible evaluations of CVGL. Additionally, we propose Cross-Attention-based Matching Enhancement (CAME), a unified framework for CVGL. By dynamically aggregating contextual information from top-ranked candidates, CAME refines feature representations and enhances cross-view matching robustness. Experimental results show (1) The Proposed UniGeoRS benchmark is necessary for training and evaluating the CVGL model across all three platforms. (2) UniGeoRS improves model generalization across diverse conditions. (3) CAME consistently boosts performance across state-of-the-art CVGL approaches.