Paper
in
Workshop: 21th Workshop on Perception Beyond the Visible Spectrum (PBVS'2025)
CSRN: Cross-Sensor Robust Recognition Network for Multi-modal Aerial View Object Classification
Hongli Liu
Target detection and classification in aerial imagery presents significant challenges due to the scarcity of target information. Electro-Optical (EO) images have limited resolution and perform poorly under adverse weather conditions. On the other hand, Synthetic Aperture Radar (SAR) images are capable of effective detection in diverse weather and low-light environments, but their performance is hindered by speckle noise, which impairs deep learning models' ability to extract meaningful features. Therefore, the use of a single sensor may not achieve the desired accuracy. To address this challenge, we propose a Cross-Sensor Robust Recognition Network (CSRN) that leverages the complementary advantages of EO and SAR imagery to overcome their individual limitations and improve the performance of Automatic Target Recognition (ATR) systems. Specifically, we design a cross-modal domain adaptation framework that learns a domain-invariant feature space, effectively mitigating domain discrepancies between different sensor data. This framework enhances the robustness and classification accuracy of the system by strengthening cross-modal feature learning. Experimental results demonstrate the superiority of the proposed approach on the PBVS MAVOC 2025 challenge dataset, achieving \textbf{1st place} in SAR classification with a total score of \textbf{0.43} and a Top-1 accuracy of \textbf{31.78\%}. This framework provides a novel solution for improving multi-source information fusion and target recognition accuracy. The code for our CSRN framework is publicly available at: \url{https://github.com/HongliLiu1/CSRN_PBVS2025}.