Poster
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths · Maryam Haghighat · Simon Denman · Clinton Fookes · Milad Ramezani
We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. Leveraging an octree-based structure, we propose a multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from common spinning lidar, we present cylindrical octree attention windows to better reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce our novel dataset: In-house, a 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in In-house contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. Our results demonstrate that HOTFormerLoc achieves a top-1 average recall improvement of 5.5\% -- 11.5\% on the In-house benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 5.8\% on well-established urban and forest datasets. The code and In-house benchmark will be made available upon acceptance.
Live content is unavailable. Log in and register to view live content