MSAG: A Multispectral Aerial–Ground Benchmark for Any-Scenario Person Re-Identification
Abstract
Recent person re-identification (ReID) leverages heterogeneous sensing with multiple modalities and viewpoints to improve robustness across diverse conditions. However, most approaches target predefined scenario pairs (e.g., visible-infrared or aerial-ground) and train separate task-specific models. In contrast, real-world applications require retrieving identities from galleries that cover all scenarios, making such designs inefficient and complex to deploy. To bridge this gap, we introduce Any-Scenario ReID (AS-ReID): given a query from any (modality, viewpoint) scenario, a single model retrieves the same identity from a heterogeneous gallery spanning all scenarios. Progress toward AS-ReID is limited by two factors: (i) the lack of a real-world-aligned benchmark with broad scenario coverage, and (ii) the challenge of learning representations that are cohesive within identities and strongly discriminative across identities under diverse scenarios. To this end, we construct MSAG, a Multispectral Aerial-Ground benchmark with 2,337 identities and 434,620 images captured by RGB, near-infrared, and thermal infrared cameras on both ground and UAV platforms. MSAG spans day-night, multiple seasons, and varied weather conditions, and supports AS-ReID as well as conventional ReID tasks. We further propose the Unified Alignment and Discrimination (UAD) framework. Progressive Center Alignment (ProCA) aggregates multi-view features into modality centers and then aligns them toward identity centers to reduce scenario bias. Global Prototype Discrimination (GPD) contrasts samples against global identity prototypes to enforce large-margin discrimination. Extensive experiments highlight the challenges of MSAG and demonstrate the effectiveness of UAD on AS-ReID. The dataset and code will be released.