Skip to yearly menu bar Skip to main content


Multi-view Aggregation Network for Dichotomous Image Segmentation

Qian Yu · Xiaoqi Zhao · Youwei Pang · Lihe Zhang · Huchuan Lu

Arch 4A-E Poster #363
award Highlight
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


Dichotomous Image Segmentation (DIS) has recently emerged towards highly accurate objects from high-resolution natural images.When designing an effective DIS model, the most challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by the human visual system capturing regions of interest by observing from multiple views, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. With the help of the proposed multi-view complementary localization and refinement modules, our approach established long-range, profound visual interactions across multiple views, allowing the features of the detailed close-up view to focus on refining highly accurate details. Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.

Live content is unavailable. Log in and register to view live content