Poster
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Minh Kha Do · Kang Han · Phu Lai · Khoa T. Phan · Wei Xiang
Foundation models for remote sensing have garnered increasing attention for their great performance across various observation tasks. However, current models lack robustness when managing diverse input types and handling incomplete data in downstream tasks. In this paper, we propose RobSense, a robust multi-modal foundation model for Multi-spectral and Synthetic Aperture Radar data. RobSense is designed with modular components and pre-trained by a combination of temporal multi-modal and masked autoencoder strategies on a huge-scale dataset. Therefore, it can effectively support diverse input types, from static to temporal, uni-modal to multi-modal. To further handle the incomplete data, we incorporate two uni-modal latent reconstructors to recover rich representations from incomplete inputs, addressing variability in spectral bands and temporal sequence irregularities. Extensive experiments show that RobSense consistently outperforms state-of-the-art baselines on the complete dataset across four input types for segmentation and classification tasks. Furthermore, the proposed model outperforms by considerably larger margins when the missing rate increases in the incomplete datasets.
Live content is unavailable. Log in and register to view live content