See What We Cannot See: A Geo-guided Reasoning Benchmark for Object Counting under Adverse Earth Observation Conditions
Abstract
Object counting in remote sensing imagery becomes challenging when visual cues are obscured by clouds, fog, shadows, or low-light conditions. Yet earth observation inherently provides complementary geo-modalities, including land use and map, which offer stable structural and contextual priors that remain available when appearance cues fail. In this paper, we introduce \textbf{GROC}, the first large-scale dataset \textbf{G}eo-guided \textbf{R}easoning in \textbf{O}bject \textbf{C}ounting under adverse earth observation conditions. GROC contains 1.2 million point annotations over 14K images, each aligned with 3 modalities that preserve original geospatial information. We also provide a data engine to collect a large-scale object counting dataset with multiple geo-modalities, realistic degradations, and reliable annotations. We further present an counting agent that adaptively leverages geo-modalities to produce reliable estimates. Extensive experiments show that existing models struggle to “see” through adverse conditions, whereas geo-modalities improve robustness. GROC establishes the first benchmark that explicitly challenges models to \textbf{see what they cannot see}, charting a new direction for geo-guided amodal reasoning in earth observation.