CVPR Poster LLaVA-Critic: Learning to Evaluate Multimodal Models

Poster

LLaVA-Critic: Learning to Evaluate Multimodal Models

Tianyi Xiong · Xiyao Wang · Dong Guo · Qinghao Ye · Haoqi Fan · Quanquan Gu · Heng Huang · Chunyuan Li

ExHall D Poster #283

[ Abstract ] [ Project Page ] [ Paper PDF ]

Sat 14 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract: We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas:

(i)

LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and

(i i)

Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

Live content is unavailable. Log in and register to view live content