Skip to yearly menu bar Skip to main content


Tutorial

Evaluations and Benchmarks in Context of Multimodal LLM

208 A
[ ]
Wed 11 Jun 11 a.m. PDT — 3 p.m. PDT

Abstract:

Despite existing various emerging benchmarks for evaluating Multimodal Large Language Models (MLLMs), the evaluation of MLLMs validity and effectiveness might remain open to further discussion. This tutorial addresses the need for comprehensive and scientifically valid benchmarks in MLLM development. The tutorial will offer a systematic overview of current MLLM benchmarks and discuss necessary performance enhancements for achieving human-level AGI. We will introduce recent developments in MLLMs, survey benchmarks, and explore evaluation methods. Detailed discussions will cover vision-language capabilities, video modality evaluations, and expert-level skills across multiple disciplines. We’ll further identify gaps in benchmarking the multimodal generalists, and introduce methods to comprehensively evaluate MLLMs. Finally, a special focus will be on addressing and mitigating the frequent hallucination phenomena in MLLMs to enhance model reliability.

Live content is unavailable. Log in and register to view live content