Language-Guided One-Step Diffusion Model for Nighttime Flare Removal
Abstract
Nighttime photography is susceptible to flare caused by strong light sources, which degrades visual quality and disrupts structural information required by downstream vision tasks. Existing nighttime flare removal methods generally lack semantic priors for flare-occluded regions and thus tend to introduce artifacts and lose details under severe degradation. To address this problem, we propose a language-guided one-step diffusion framework that explicitly aligns flare-occluded regions with the underlying scene content at the semantic level. Specifically, we develop the first flare-specific vision–language model, Flare-VLM, which extracts fine-grained textual descriptions to guide one-step diffusion for high-quality restoration of severely damaged areas. Then, we propose semantics-aware distribution distillation to constrain the noise distribution with high-level semantics, suppressing redundant perturbations on clean backgrounds and improving the stability of distillation. In addition, we design an instruction-driven data synthesis pipeline to generate geometrically and semantically aligned nighttime flare samples, narrowing the gap to real degradations. Experimental results demonstrate that the proposed method achieves better restoration and enhances the performance of downstream vision tasks.