SFR-Net: Steering-Fusion-Refining Network in Multi-label Zero-Shot Sewer Defect Detection
Abstract
Due to the prohibitive cost of data annotation and the impossibility of exhaustively enumerating all defect categories, municipal sewer pipe defect detection poses significant generalization challenges for traditional models. Multi-Label Zero-Shot Learning (ML-ZSL) offers a viable solution to address this challenge. However, existing methods struggle to establish robust and fine-grained visual-semantic alignment between the complex visual environment inside the pipes and the often sparse semantic descriptions, leading to a critical issue: Alignment Ambiguity. To mitigate this, we propose a novel Steering-Fusion-Refining Network (SFR-Net) that follows a three-stage paradigm to progressively dissolve this ambiguity. This is achieved as the Representation Steering (RS) module first integrates a parameter-efficient feature steering mechanism to continuously adapt the representation to the pipe scene; the Multi-Granularity Evidence Fusion (MEF) module subsequently aggregates unambiguous multi-granularity visual evidence through decoupled global and local paths; and the Generalized Relational Score Refining (GR) module ultimately learns and transfers relational logic from seen defects to gain a universal score correction ability, directly refining preliminary prediction scores and significantly boosting the model’s zero-shot generalization and prediction consistency. Extensive experiments on the public Sewer-ML dataset and our private WZ-Pipe dataset demonstrate that the proposed SFR-Net achieves state-of-the-art (SOTA) performance in multi-label zero-shot learning task.