Universal Guideline-Driven Image Clustering via a Hybrid LLM Agent
Abstract
Unifying image clustering across different clustering scenarios remains challenging due to fundamental gaps among tasks. We introduce a Guideline-Driven Image Clustering Agent, the first universal framework that bridges these gaps through textual guidelines. To incorporate complex guidelines without task-specific training, we propose Generative Concept Proxy Modeling, which generates guideline-aware embeddings via concept proxy extraction. For scenarios requiring automatic cluster discovery, we introduce MST-based LLM Traversal that selectively applies LLM reasoning for complex semantic judgments, reducing computational costs. Our method generalizes across diverse clustering scenarios spanning from general to fine-grained categorization, from global to local criteria, and from balanced to long-tail distributions. We demonstrate superior performance across various clustering tasks, consistently outperforming specialized state-of-the-art methods.