Spatial-Frequency Collaborative Learning for Occluded Visible-Infrared Person Re-Identification
Abstract
Occluded visible–infrared person re-identification (Occluded VI-ReID) remains difficult due to modality heterogeneity and occlusions, both of which break structural consistency and weaken cross-modality feature alignment. Existing methods rely mainly on spatial-domain cues (such as local body parts and salient patches), but their discriminability degrades severely under varying imaging conditions or partial visibility. To address these issues, we introduce a spatial-frequency collaborative perspective that offers global perception and cross-location consistency. Specifically, we propose a Spatial-Frequency Collaborative Learning (SFCL) framework that uses frequency information to complement spatial representations. SFCL comprises a Cross-Modality Frequency Alignment Module (CFAM), a Spatial-Frequency Interaction Module (SFIM), and a Frequency-Aware Discriminative (FAD) loss. The CFAM models the spectral features of visible/infrared images in the frequency domain, establishing modality-consistent spectral priors. The SFIM injects these priors into spatial features, promoting dual-domain interaction and complementary representations of spatial and frequency semantics. In addition, the FAD loss jointly enforces cross-modality frequency alignment and semantic consistency, thus enhancing robustness and discriminability under occlusions. For real-occlusion evaluation, we construct two occluded datasets, Occ-SYSU-MM01 and Occ-RegDB, on which SFCL outperforms the state-of-the-art.