TAR: Token-Aware Refinement for Fine-grained Generalized Category Discovery
XingYu Yang ⋅ Yu Zhang ⋅ Siya Mi ⋅ Xiu-Shen Wei
Abstract
For an unlabelled dataset containing known and unknown categories, Generalized Category Discovery (GCD) aims to classify the known categories exactly while simultaneously discovering the unknown categories. Current GCD methods have achieved significant progress on coarse-grained datasets but still struggle to generalize to fine-grained scenarios. We observe that attention artifacts, a phenomenon where the attention map exhibits abnormally high responses concentrated on a few tokens, significantly interferes with fine-grained GCD. In this paper, we argue that attention artifacts compel the model to overemphasize global semantics, consequently overlooking fine-grained local cues that are crucial for category discrimination. We propose the $\textbf{T}$oken-$\textbf{A}$ware $\textbf{R}$efinement ($\textbf{TAR}$) framework, which introduces a plug-and-play module to mitigate the impact of attention artifacts and enhances the concentration of local information. TAR departs from the conventional classification paradigm that relies solely on the first token as input to the classifier. Instead, it fully exploits the entire token sequence, thereby significantly enhancing the model's focus on fine-grained local information. Extensive experiments demonstrate the superior performance of TAR across various benchmarks.
Successful Page Load