Skip to yearly menu bar Skip to main content


Unsupervised Salient Instance Detection

Xin Tian · Ke Xu · Rynson W.H. Lau

Arch 4A-E Poster #247
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


The significant amount of manual efforts in annotating pixel-level labels has triggered the advancement of unsupervised saliency learning. However, without supervision signals, state-of-the-art methods can only infer region-level saliency. In this paper, we propose to explore the unsupervised salient instance detection (USID) problem, for a more fine-grained visual understanding. Our key observation is that self-supervised transformer features may exhibit local similarities as well as different levels of contrast to other regions, which provide informative cues to identify salient instances. Hence, we propose CoCo, a novel network that models saliency coherence and contrast for USID. SCoCo includes two novel modules: (1) a global background adaptation (GBA) module with a scene-level contrastive loss to extract salient regions from the scene by searching the adaptive “saliency threshold” in the self-supervised transformer features, and (2) a locality-aware similarity (LAS) module with an instance-level contrastive loss to group salient regions into instances by modeling the in-region saliency coherence and cross-region saliency contrasts. Extensive experiments show that SCoCo outperforms state-of-the-art weakly-supervised SID methods and carefully designed unsupervised baselines, and has comparable performances to fully-supervised SID methods.

Live content is unavailable. Log in and register to view live content