Skip to yearly menu bar Skip to main content


Poster

VCR: Learning Appearance-Invariant Representation for Open-World Instance Segmentation

Chang-Bin Zhang · Jinhong Ni · Yujie Zhong · Kai Han


Abstract:

In this paper, we address the challenging problem of open-world instance segmentation. Existing works have shown that vanilla visual networks are biased toward learning appearance information, e.g., texture, to recognize objects. This implicit bias causes the model to fail in detecting novel objects with unseen textures in the open-world setting. To address this challenge, we propose a learning framework, called View-Consistent leaRning (VCR), which aims to enforce the model to learn appearance-invariant representations for robust instance segmentation. In VCR, we first introduce additional views for each image, where the texture undergoes significant alterations while preserving the image's underlying structure. We then encourage the model to learn the appearance-invariant representation by enforcing the consistency between object features across different views, for which we obtain class-agnostic object proposals using off-the-shelf unsupervised models that possess strong object-awareness. These proposals enable cross-view object feature matching, greatly reducing the appearance dependency while enhancing the object-awareness. We thoroughly evaluate our VCR on public benchmarks under both cross-class and cross-dataset settings, achieving state-of-the-art performance.

Live content is unavailable. Log in and register to view live content