Learning and Aligning Click-Aware Shape Prior for Interactive Amodal Instance Segmentation
Abstract
Amodal instance segmentation aims to segment both visible and occluded regions of object instance, which are challenging due to lacking inference support under occlusion. Most existing methods employ the prior knowledge about object mask (shape prior) to support the amodal estimation, but the shape prior is not always compatible for object instances in the test stage. In this paper, we explore the task of interactive amodal segmentation, where a few user clicks are available for better segmenting the complete masks of object instances.For this task, we propose a novel framework based on learning and aligning click-aware shape prior. Specifically, we propose to learn click-aware shape prior with triplet loss, which forces the retrieved shape priors to have higher IoU with the ground-truth of target instance and thus could exactly facilitate the prediction. Besides, considering the inevitable mismatch between shape prior and target instance, we propose to adaptively align the shape prior with deformable attention. Overall, our model could make full use of the interactive clicks to retrieve and align shape priors, and thus could estimate more complete masks. Extensive experiments on three benchmark datasets (i.e., KINS, D2SA and COCOA cls) demonstrate the effectiveness of our method.