Skip to yearly menu bar Skip to main content


Workshop

Sight and Sound

Andrew Owens · Jiajun Wu · Kristen Grauman · Antonio Torralba · William Freeman · Andrew Zisserman · Hang Zhao · Ruohan Gao · Triantafyllos Afouras · Arsha Nagrani · Jean-Charles Bazin

211

Wed 11 Jun, 6:30 a.m. PDT

Keywords:  Multimodal learning  

Since pretty much every video has an audio track, the prospect of learning from paired audio-visual data — either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms — is intuitively appealing, and this workshop will cover recent advances in this direction. But it will also touch on higher-level questions, such as what information sound conveys that vision doesn’t, the merits of sound versus other “supplemental” modalities such as text and depth, and the relationship between visual motion and sound. We’ll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.

Live content is unavailable. Log in and register to view live content