Uncertainty-Aware Modality Fusion for Unaligned RGB-T Salient Object Detection
Abstract
Unaligned RGB-T salient object detection (SOD) remains challenging due to severe cross-modal spatial discrepancies and unreliable feature fusion. Existing methods often assume perfect alignment or rely on geometric registration, which is computationally demanding and sensitive to cross-modal inconsistencies. To address these limitations, we propose an uncertainty-aware modality fusion network (UMFNet) that reformulates RGB-T SOD as an uncertainty-aware representation learning problem. Specifically, the proposed uncertainty alignment module (UAM) models pixel-wise features as Gaussian latent distributions to estimate local uncertainty and identify cross-modal consistency regions within the feature space, thereby achieving implicit alignment without explicit registration. Furthermore, the confidence-guided global modulation (CGM) mechanism leverages confidence maps derived from uncertainty estimation to adaptively regulate the fusion of RGB and thermal features, enhancing salient cues in reliable regions while suppressing noisy or inconsistent information. Extensive experiments on five unaligned and three aligned RGB-T SOD benchmarks demonstrate that UMFNet achieves state-of-the-art performance across diverse alignment conditions.