Real-World Point Tracking with Verifier-Guided Pseudo-Labeling
Abstract
Models for long-term point tracking are typically trained on large synthetic datasets. The performance of these models degrades in real-world videos due todifferent characteristics and the absence of dense ground-truth annotations.Self-training on unlabeled videos has been explored as a practical solution, but the quality of pseudo-labels strongly depends on the reliability of teacher predictions, which vary across frames and scenes.In this paper, we address the problem of real-world fine-tuning and introduce Verifier, a meta-model that learns to assess the reliability of tracker predictions and guide pseudo-label generation.Given candidate trajectories from multiple pretrained trackers, the verifier evaluates them per frame and selects the most trustworthy predictions to construct refined pseudo-label trajectories.When applied during fine-tuning, verifier-guided pseudo-labeling substantially improves the quality of supervision and enables data-efficient adaptation to unlabeled videos.Extensive experiments on four real-world benchmarks demonstrate that our approach achieves state-of-the-art results while requiring less data than prior self-training methods.