Skip to yearly menu bar Skip to main content


Poster

StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

Zanxi Ruan ⋅ Songqun Gao ⋅ Qiuyu Kong ⋅ Yiming Wang ⋅ Marco Cristani

Abstract

Log in and register to view live content