Skip to yearly menu bar Skip to main content


CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu · Xiangyuan Lan · Lijun Zhang · Dongmei Jiang · Yaowei Wang · Chun Yuan

Arch 4A-E Poster #211
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Over the past decade, most methods in visual place recognition (VPR) have used neural networks to produce feature representations. These networks typically produce a global representation of a place image using only this image itself and neglect the cross-image variations (e.g. viewpoint and illumination), which limits their robustness in challenging scenes. In this paper, we propose a robust global representation method with cross-image correlation awareness for VPR, named CricaVPR. Our method uses the self-attention mechanism to correlate multiple images within a batch. These images can be taken in the same place with different conditions or viewpoints, or even captured from different places. Therefore, our method can utilize the cross-image variations as a cue to guide the representation learning, which ensures more robust features are produced. To further facilitate the robustness, we propose a multi-scale convolution-enhanced adaptation method to adapt pre-trained visual foundation models to the VPR task, which introduces the multi-scale local information to further enhance the cross-image correlation-aware representation. Experimental results show that our method outperforms state-of-the-art methods by a large margin with significantly less training time. Our method achieves 94.5% R@1 on Pitts30k using 512-dim global features. The code will be publicly available.

Live content is unavailable. Log in and register to view live content