Workshop

Workshop on Video Large Language Models

Mubarak Shah ⋅ Larry S. Davis ⋅ Rene Vidal ⋅ Son Dinh Tran ⋅ Angela Yao ⋅ Salman Khan ⋅ Rita Cucchiara ⋅ Cees G. M. Snoek ⋅ Christoph Feichtenhofer ⋅ Chang Xu ⋅ Jayakrishnan Unnikrishnan ⋅ Afshin Dehghan ⋅ Mamshad Nayeem Rizve ⋅ Rohit Gupta ⋅ Swetha Sirnam ⋅ Ashmal Vayani ⋅ Omkar Thawakar ⋅ Muhammad Uzair Khattak ⋅ Dmitry Demidov

Project Page

Abstract

This workshop will explore the evolution, applications, and challenges of Video Large Language Models (VidLLMs), the latest advancement in multimodal LLMs. It will feature keynote talks from leading researchers, a panel discussion comparing VidLLMs with expert models, and a poster session. The workshop also includes three challenge tracks designed to evaluate VidLLMs' capabilities in compositional video retrieval, complex video reasoning and robustness, and multilingual video reasoning. These tracks aim to address key research areas such as training VidLLMs, their application in specialized computer vision tasks, and the challenges in evaluating their performance. Potential topics for invited papers include VidLLM methods/algorithms, data creation, evaluation and analysis, best practices, applications, and limitations, risks and safety.

Chat is not available.