ReCoFuse: Ultra-Robust Image Fusion via Restorative Multi-Modal Diffusion Reciprocal Coupling
Abstract
Existing methods following the integrated hard-regression or decoupling optimization paradigms exhibit limited fusion performance under complex degradations. To address these paradigm-level shortcomings, we propose ReCoFuse, an ultra-robust image fusion framework based on restorative multi-modal diffusion reciprocal coupling. ReCoFuse redefines the relationship between information restoration and integration, deriving a novel reciprocal coupling optimization paradigm through their mutual reinforcement. It first constructs two restoration branches using diffusion modules (DiM) to capture modality-specific restoration priors. Then, time-aware cross-modal integration modules (TIM) are introduced as a bridge to couple restoration and integration, embedded at each DiM sampling timestep to aggregate multi-modal information. The aggregated variable not only feeds back to each restoration branch to enhance degradation removal via cross-modal complementarity, but also generates high-quality fused images that comprehensively represent the scene. Moreover, an alternating regularization mechanism is designed to iteratively optimize DiM and TIM along the gradient path, ensuring effective collaboration between restoration and integration. Extensive experiments show that ReCoFuse achieves state-of-the-art performance under challenging degradations such as low light, haze, noise, low contrast, and stripes.