ORBIT: Benchmarking SfM in the Wild with 360° Video
Abstract
Structure-from-Motion (SfM) is a cornerstone of 3D perception, yet current methods often fail when applied to complex videos involving challenging camera motions or dynamic scenes.Compounding the problem, the field lacks reliable ground-truth benchmarks for such difficult scenarios, making it hard to gauge real-world progress, or pinpoint where improvements are most needed.To address this gap, we introduce a new benchmark for evaluating camera pose estimation.Our key insight is to leverage online panoramic 360° as a source of data from which to construct challenging clips, while still enabling robust ground-truth trajectory recovery.The panoramic nature of these videos provides richer visual context for tracking camera motion, even when parts of the view are affected by blur, motion, or dynamic objects.By tracking camera motion across full 360° videos, we crop and reproject selected portions to generate perspective-view clips that serve as our benchmark---ORBIT---a diverse collection of 100 video clips.Experiments show that COLMAP and other state-of-the-art SfM methods struggle to accurately estimate camera positions on our benchmark, indicating that it remains a challenging and open problem space for future research.As a result, ORBIT provides a valuable testbed where researchers can meaningfully compete and measure progress on truly challenging, real-world SfM problems.