Expert-Teacher-Student Collaborative Learning for Domain Adaptive Object Detection
Abstract
Domain adaptive object detection (DAOD) aims to generalize an object detector trained on a source domain to a target domain, where the domain gap degrades the adaptability. Recently, large-scale vision foundation models (VFMs), pretrained on web-scale datasets, exhibit such powerful generalization capabilities that many approaches leverage them to bridge the domain gap. However, their generalized knowledge is not tailored to the specific domain, which makes it difficult to offer precise guidance in the target domain. In this paper, we propose an Expert-Teacher-Student collaborative learning (ETS) framework to synergize the generalized knowledge from VFMs with the domain-specific knowledge from the teacher model. Concretely, we first design an Expert-Teacher Collaborative Teaching (ETCT) module, which leverages the complementary knowledge of expert and teacher models to collaboratively generate high-quality pseudo labels for supervising student model learning. Second, we devise an Expert-Teacher Joint Consolidating (ETJC) module, which introduces class-wise prototype alignment among expert, teacher, and student models, to jointly consolidate generalized and domain-specific knowledge within the student model. ETS leverages VFMs as the expert model in a free lunch manner, thus avoiding significant additional training costs. Extensive experiments exhibit that our method outperforms the existing SOTA methods on three benchmarks.Our code is available in the supplementary materials.