Skip to yearly menu bar Skip to main content


Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration

Yixuan Sun · Zhangyue Yin · Haibo Wang · Yan Wang · Xipeng Qiu · Weifeng Ge · Wenqiang Zhang

Arch 4A-E Poster #237
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Establishing precise semantic correspondence across object instances in different images is a fundamental and challenging task in computer vision. In this task, difficulty arises often due to three challenges: confusing regions with similar appearance, inconsistent object scale, and indistinguishable nearby pixels. Recognizing these challenges, our paper proposes a novel semantic matching pipeline named LPMFlow toward extracting fine-grained semantics and geometry layouts for building pixel-level semantic correspondences. LPMFlow consists of three modules, each addressing one of the aforementioned challenges. The layout-aware representation learning module uniformly encodes source and target tokens to distinguish pixels or regions with similar appearances but different geometry semantics. The progressive feature superresolution module outputs four sets of 4D correlation tensors to generate accurate semantic flow between objects in different scales. Finally, the matching flow integration and refinement module is exploited to fuse matching flow in different scales to give the final flow predictions. The whole pipeline can be trained end-to-end, with a balance of computational cost and correspondence details. Extensive experiments based on benchmarks such as SPair-71K, PF-PASCAL, and PF-WILLOW have proved that the proposed method can well tackle the three challenges and outperform the previous methods, especially in more stringent settings. Code is available at

Live content is unavailable. Log in and register to view live content