Skip to yearly menu bar Skip to main content


Poster

CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning

Xiaokun Li · Yaping Huang · Qingji Guan


Abstract:

Fine-grained open-set semi-supervised learning (OSSL) investigates a practical scenario where unlabeled data may contain fine-grained out-of-distribution (OOD) samples. Due to the subtle visual differences among in-distribution (ID) samples, as well as between ID and OOD samples, it is extremely challenging to separate ID and OOD samples. Recent Vision-Language Models, such as CLIP, have shown excellent generalization capabilities. However, it tends to focus on general attributes, and thus is insufficient to distinguish the fine-grained details. To tackle the issues, in this paper, we propose a novel CLIP-driven coarse-to-fine semantic-guided framework, named CFSG-CLIP, by progressively filtering and focusing the distinctive fine-grained clues. Specifically, CFSG-CLIP comprises a coarse-guidance module and a fine-guidance module derived from the pre-trained CLIP model. In the coarse-guidance module, we design a semantic filtering strategy to initially filter out local visual features guided by cross-modality guidance. Then, in the fine-guidance module, we further design a visual-semantic injection strategy, which embeds category-related visual cues into the visual encoder to further refine the local visual features. By the designed dual-guidance framework, the local subtle cues are progressively discovered to distinct the subtle difference between ID and OOD samples. Extensive experiments demonstrates that CFSG-CLIP is able to not only improve the reliability of the fine-grained semi-supervised learning training process, but also achieves a competitive performance on multiple fine-grained datasets.

Live content is unavailable. Log in and register to view live content