TACO: Task-Aware Contrastive Learning for Joint LiDAR Localization and 3D Object Detection
Abstract
Reliable navigation and decision-making of autonomous vehicles require both accurate localization and object detection. Traditionally, these two tasks are handled separately, leading to redundant computation and limited cross-task knowledge transfer. This paper proposes TACO, the first Task-Aware COntrastive learning framework, which performs joint LiDAR localization and 3D object detection within a single, unified network. TACO leverages contrastive learning to explicitly decouple and align static geographic features for localization and object-centric features for detection. This bidirectional mutual supervision not only enhances localization robustness in dynamic environments by filtering dynamic noise but also boosts detection accuracy via effective spatial context. Additionally, we propose OxfoLD, the first dataset that provides multi-traversal LiDAR localization ground truth with rich 3D object annotations, thereby supporting task validation across various times and weather conditions. Experimental results demonstrate that TACO achieves state-of-the-art localization accuracy while maintaining competitive detection performance. The code and dataset will be released.