VKG-QA: Visual Knowledge Graph-based Question Answer for Large Multimodal Models
Abstract
Understanding and reasoning over structured knowledge is a fundamental capability for intelligent systems. While Large Language Models (LLMs) have leveraged textual knowledge graphs for relational reasoning, linearizing graph structures into text often leads to token inefficiency and loss of higher-order relational cues. Inspired by the advances of Large Multimodal Model to capture higher-order relational structures explicitly novel paradigm of \textit{visualized knowledge representation}, where knowledge graphs are transformed into graphical visualizations that LMMs can directly perceive and reason over. To systematically evaluate this capability, we introduce \textbf{VKG-QA}, a benchmark for \textit{Visual Knowledge Graph-based Question Answering}, covering three major categories and fourteen subtasks. VKG-QA is constructed via a semi-automatic pipeline ensuring high-quality, semantically aligned, and visually clear data. We evaluate 19 representative LMMs on VKG-QA and perform extensive quantitative and qualitative analyses. Results reveal that current models struggle with visualized relational understanding, graph-specific comprehension remains challenging, and closed-source models significantly outperform open-source counterparts. VKG-QA thus highlights critical limitations in current LMMs and provides a scalable platform for advancing graph-aware visual reasoning.