FusionRegister: Every Infrared and Visible Image Fusion Deserves Registrtaion
Abstract
Spatial registration across different visual modalities is a critical but formidable step in multi-modality image fusion for real-world perception. Although there are several methods are proposed to address this issue, the existing registration joint fusion methods typically require extensive pre-registration operations, limiting their efficiency. To overcome these limitations, a general cross modality registration method guided by visual priors is proposed for multi-modality image fusion task, termed as FusionRegister.Firstly, FusionRegister achieves robustness by learning cross-modality misregistration representations rather than forcing alignment of all differences, ensuring stable outputs even under challenging input conditions.Moreover, FusionRegister demonstrates strong generality by operating directly on fused results, where misregistration is explicitly represented and effectively handled, enabling seamless integration with diverse fusion methods while preserving their intrinsic properties. In addition, its efficiency is further enhanced by serving the backbone fusion method as a natural visual prior provider, which guide the registration process to focus only on regions affected by misregistration, thereby avoiding redundant operation. Extensive experiments on three datasets demonstrate that FusionRegister not only inherits the fusion quality of state-of-the-art methods, but also delivers superior detail alignment, robustness, and adaptability, making it highly suitable for any infrared and visible image fusion method. The code is available in supplementary material.