Skip to yearly menu bar Skip to main content


Poster

Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection

Ting Li · Mao Ye · Tianwen Wu · Nianxin Li · Shuaifeng Li · Song Tang · Luping Ji


Abstract:

Thermal object detection is a critical task in various fields, such as surveillance and autonomous driving. Current state-of-the-art (SOTA) models always leverage a prior Thermal-To-Visible (T2V) translation model to obtain visible spectrum information, followed by a cross-modality aggregation module to fuse information from both modalities. However, this fusion approach does not fully exploit the complementary visible spectrum information beneficial for thermal detection. To address this issue, we propose a novel cross-modal fusion method called Pesudo Visible Feature Fine-Grained Fusion (PFGF). Unlike previous cross-modal fusion methods, our approach explicitly models high-level relationships between cross-modal data, effectively fusing different granularity information. Specifically, a graph is constructed with nodes generated from features at different levels of the backbone, along with pseudo-visual latent features produced by the T2V model. Each feature corresponds to a subgraph. An Inter-Mamba block is proposed to perform cross-modality fusion between nodes at the lowest scale; while a Cascade Knowledge Integration (CKI) strategy is used to fuse low-scale fused information to high-scale subgraphs in a cascade manner. After several iterations of graph node updating, each subgraph outputs an aggregated feature to the detection head respectively. Experimental results demonstrate that our method achieves SOTA detection performance and is more efficient. Code, data, and models will be released upon publication of this paper.

Live content is unavailable. Log in and register to view live content