Generalizable Structure-Aware Keypoint Correspondence for Category-Unified 3D Single Object Tracking
Abstract
3D single object tracking (SOT) in point clouds is a fundamental component of autonomous perception but remains challenging due to sparse observations, irregular geometry, and frequent occlusion. Most prior methods adopt a category-specific paradigm, requiring individual models for different object types. This design hinders scalability and generalization, as object categories in the real world exhibit vast variations in scale and structure. In this work, we present UniKPT, a category-unified and structure-aware framework that performs robust 3D tracking across diverse object classes without relying on category priors. UniKPT introduces three key innovations: (1) an adaptive structural keypoint extractor that identifies scale-consistent and semantically meaningful points; (2) a progressive correspondence aligner that enforces hierarchical geometric consistency across frames; and (3) a confidence-aware localization module that adaptively refines tracking by suppressing uncertain correspondences and exploiting inter-keypoint structural relations. Experiments on the nuScenes and KITTI benchmarks demonstrate that a single UniKPT model not only generalizes across categories but also outperforms state-of-the-art category-specific trackers, achieving gains of +4.37% in Success and +5.16% in Precision on nuScenes.