Taming Generative Diffusion Model for Task-Oriented Infrared Imaging
Abstract
Infrared (IR) imaging is indispensable for perception in adverse environments, yet real-world data is often corrupted by dynamically coupled degradations that impair both visual quality and downstream semantic understanding. Although diffusion models offer powerful generative priors, existing approaches remain ill-suited to this setting. Their slow multi-step sampling, reliance on RGB-driven statistics misaligned with IR physics, and the necessity for costly fine-tuning of all model parameters render them impractical for dynamic IR perception. We present a unified diffusion framework that re-formulates IR restoration as a single-step generative process. The core idea is to associate each degraded input with a specific intermediate latent state in the diffusion trajectory, enabling the model to reconstruct the clean image via a single, direct reverse step. Physical realism is further reinforced through an IR-specific spectral regularization that preserves the characteristic energy distribution of thermal emissions. Addressing the diverse and rapidly shifting demands of dynamic IR perception, we further develop a task-aware low-rank adaptation mechanism. This mechanism employs a lightweight prompting hypernetwork to generate compact modulation parameters, facilitating rapid and scalable adaptation ability without retraining the entire network. Comprehensive evaluations demonstrate that our framework attains state-of-the-art restoration performance, preserves reliable semantic structures, and supports rapid adaptation that generalizes effectively across diverse tasks and conditions.