Poster
Reasoning to Attend: Try to Understand How <SEG> Token Works
Rui Qian · Xin Yin · Dejing Dou
ExHall D Poster #353
Abstract:
Current Large Multimodal Models (LMMs) empowered tasks such as visual grounding and segmentation typically rely on $\texttt{} \eg \aka \texttt{} \texttt{} \texttt{} \textbf{REA} \texttt{} \eg$, LISA), we further assess its generation ability on FP-RefCOCO(+/g) dataset. All code, models will be publicly available.
Chat is not available.