AgentDet: A Shared-Blackboard Multi-Agent Framework for Zero-/Few-Shot Object Detection
Abstract
Large multimodal language models have made rapid progress on vision–language tasks, yet their potential for zero-/few-shot object detection (ZSOD/FSOD) under a closed set of target classes remains underexplored. ZSOD/FSOD is hampered by data scarcity and catastrophic forgetting. Although vision–language models (VLMs) report strong numbers on several benchmarks, they typically rely on massive visual pretraining, which is misaligned with FSOD’s goal of testing generalization to novel classes under limited supervision. We introduce AgentDet, a shared-blackboard multi-agent framework that unifies ZSOD and FSOD via pseudo-incremental learning. AgentDet decouples detection into four cooperating roles—Agent-Scout, Agent-Pinner, Agent-Curator, and Agent-Judge—which collaboratively maintain a Shared Blackboard and a Knowledge Base. For efficiency, we train only Agent-Judge—updating its image encoder and LLM-based detection head—yielding a lightweight recipe that encourages generalization to previously unseen categories. On PASCAL VOC and MS COCO ZSOD/FSOD protocols, AgentDet delivers strongly competitive performance with state-of-the-art results in several settings. Ablations confirm the contributions of blackboard collaboration, safe-write policies, and the pseudo-incremental schedule.