CompetitorFormer: Mitigating Query Conflicts for 3D Instance Segmentation via Competitive Strategy
Abstract
Transformer-based approaches have recently become the dominant paradigm for 3D instance segmentation. These methods typically employ a multi-layer decoder that iteratively refines a set of learnable queries into instance mask predictions. However, we observe that multiple queries often target the same instance simultaneously, leading to fragmented masks for a single object. We define this phenomenon as \emph{inter-query competition}, which slows convergence and limits segmentation accuracy. To address this problem, we present \textbf{CompetitorFormer}, a novel framework designed for Transformer-based methods. Our method mitigates inter-query competition by explicitly modeling the competitive relationships among queries. Specifically, we introduce a \emph{Query Competition Layer} before each decoder stage to construct a dynamic competitive landscape, allowing each query to perceive its relative importance. In addition, the proposed \emph{Relative Relationship Encoding} and \emph{Rank Cross-Attention} modules enhance both self-attention and cross-attention by prioritizing dominant queries. Extensive experiments show that our approach converges faster and achieves superior performance on the ScanNetV2, ScanNet++V2, ScanNet200, and S3DIS datasets.