Poster Sat, Jun 6, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 572

Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection

Weihao Cao ⋅ Runqi Wang ⋅ Xiaoyue Duan ⋅ Jinchao Zhang ⋅ Ang Yang ⋅ Liping Jing

Abstract

Open-vocabulary object detection (OVOD) enables models to detect any object category, including unseen ones. Benefiting from large-scale pre-training, existing OVOD methods achieve strong detection performance on general scenarios (e.g., OV-COCO) but suffer severe performance drops when transferred to downstream tasks with substantial domain shifts.This degradation stems from the scarcity and weak semantics of category labels in domain-specific task, as well as the inability of existing models to capture auxiliary semantics beyond coarse-grained category label.To address these issues, we propose HSA-DINO, a parameter-efficient semantic augmentation framework for enhancing open-vocabulary object detection. Specifically, we propose a multi-scale prompt bank that leverages image feature pyramids to capture hierarchical semantics and select domain-specific local semantic prompts, progressively enriching textual representations from coarse to fine-grained levels.Furthermore, we introduce a semantic-aware router that dynamically selects the appropriate semantic augmentation strategy during inference, thereby preventing parameter updates from degrading the generalization ability of the pre-trained OVOD model.