Poster Sun, Jun 7, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 323

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Jiahao Wang ⋅ Bo Sun ⋅ Yijing Bai ⋅ Vincent Casser ⋅ Songyou Peng ⋅ Zehao Zhu ⋅ Meng-Li Shih ⋅ Xander Masotto ⋅ Shih-Yang Su ⋅ Kanaad Parvate ⋅ Tiancheng Ge ⋅ Linn Bieske ⋅ Dragomir Anguelov ⋅ Mingxing Tan ⋅ Chiyu “Max” Jiang

Abstract

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for system validation and training purposes. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite that we refer to as the AV log, which includes multi-view camera images and LiDAR point clouds. A core challenge that arises is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform a comprehensive set of quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.