Skip to yearly menu bar Skip to main content


Poster

Supervising Sound Localization using In-the-wild Egomotion

Anna Min · Ziyang Chen · Hang Zhao · Andrew Owens


Abstract:

We present a method for learning binaural sound localization from ego-motion in videos. When the camera moves in a video, the direction of sound sources will change along with it. We train an audio model to predict sound directions that are consistent with visual estimates of camera motion, which we obtain using methods from multi-view geometry. This provides a weak but plentiful form of supervision that we combine with traditional binaural cues. To evaluate this idea, we propose a dataset of real-world audio-visual videos with ego-motion. We show that our model can successfully learn from this real-world data, and that it obtains strong performance on sound localization tasks.

Live content is unavailable. Log in and register to view live content