Progressive Multi-cue Alignment for Unaligned RGBT Tracking
Abstract
Unaligned RGBT tracking aims to achieve robust target localization across spatially misaligned RGB and thermal infrared (TIR) videos, a crucial challenge for applying RGBT tracking in real-world scenarios.Existing methods often calculate all cross-modal alignment parameters (i.e., spatial shift and scale change) simultaneously, but suffer from two major limitations. 1) They are difficult to adapt to different degrees of unaligned difficulty during tracking. 2) They usually require complex models to handle challenging scenarios, resulting in a large computational burden.To overcome these limitations, we propose a novel Progressive Multi-cue Alignment framework called PMATrack, which disentangles the calculation of cross-modal alignment parameters in a progressive manner and dynamically selects appropriate cues to handle different challenges, thereby enabling robust and efficient unaligned RGBT tracking. In particular, PMATrack divides the cross-modal alignment parameter estimation into three stages to progressively perform center offset computation, scale transformation estimation, and global refinement. At each stage, we design a difficulty-aware router to adaptively select the appropriate alignment expert based on the cross-modal alignment complexity, thereby reducing computational redundancy.In addition, we build a high-quality video benchmark called MUART244 to facilitate the comprehensive evaluation of different unaligned RGBT tracking algorithms. Extensive experiments on our MUART244 and public LasHeR-Unaligned datasets demonstrate the outstanding performance of PMATrack against existing state-of-the-art methods.