V2U4Real: A Real-world Large-scale Dataset for Vehicle-to-UAV Cooperative Perception
Abstract
Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range, hindering progress toward Level 5 autonomy. While existing cooperative perception paradigms such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex 3D environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K manually annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness, particularly under severe occlusion conditions.