Skip to yearly menu bar Skip to main content


Poster

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Bo-Yuan Sun ⋅ Bo-Wen Yin ⋅ Yuan-Ming Li ⋅ Xihan Wei ⋅ Qibin Hou

Abstract

Log in and register to view live content