Simple but Effective Triplet-Based Compression Strategies for Compact Visual Localization
Abstract
Visual localization, i.e., the problem of estimating the camera pose from which an image was taken, is an important part of applications such as augmented reality and autonomous robots. Many of these applications require a compact memory footprint. Thus, a considerable amount of work has been spent on designing memory-efficient scene representations for visual localization. In this paper, we focus on compressing the 3D structure of the scene by selecting a subset of points from a Structure-from-Motion (SfM) point cloud. In contrast to prior work, which aims to solve (complex) optimization problems, we propose a simple strategy that is almost trivial to implement. Our compression strategy is based on the idea of selecting triplets of points such that the camera pose of each database image (used to build the SfM point cloud) can be accurately estimated from these triplets. Despite its simplicity, our strategy performs similarly to or better than current state-of-the-art structure compression approaches. Combined with standard product quantization approaches to compress feature descriptors, our approach compares favorably with recent learning-based approaches for compact visual localization.