X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis
Abstract
Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity in complex multi-modal diagnostic scenarios remains largely unexamined. Current benchmarks, predominantly limited to single-modality data, lack the capacity to evaluate progressive reasoning and cross-modal integration essential for clinical practice. To bridge this gap, we introduce Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation framework for MLLMs spanning the complete ophthalmology diagnostic workflow. X-PCR incorporates two core reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) A cross-modality reasoning task integrating six ophthalmic imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Our evaluation of 21 leading MLLMs reveals critical gaps in progressive reasoning and cross-modal integration. X-PCR establishes a unified benchmark to advance MLLMs from task-specific performance to comprehensive diagnostic capability through aligned multi-modal clinical data. Dataset and code will be publicly released.