Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding

Praneeth Chakravarthula · Jim Aldon D’Souza · Ethan Tseng · Joe Bartusek · Felix Heide

West Building Exhibit Halls ABC 093


Existing autonomous vehicles primarily use sensors that rely on electromagnetic waves which are undisturbed in good environmental conditions but can suffer in adverse scenarios, such as low light or for objects with low reflectance. Moreover, only objects in direct line-of-sight are typically detected by these existing methods. Acoustic pressure waves emanating from road users do not share these limitations. However, such signals are typically ignored in automotive perception because they suffer from low spatial resolution and lack directional information. In this work, we introduce long-range acoustic beamforming of pressure waves from noise directly produced by automotive vehicles in-the-wild as a {complementary sensing modality} to traditional optical sensor approaches for detection of objects in dynamic traffic environments. To this end, we introduce the first multimodal long-range acoustic beamforming dataset. We propose a neural aperture expansion method for beamforming and we validate its utility for multimodal automotive object detection. We validate the benefit of adding sound detections to existing RGB cameras in challenging automotive scenarios, where camera-only approaches fail or do not deliver the ultra-fast rates of pressure sensors.

Chat is not available.