Skip to yearly menu bar Skip to main content


CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering

Shaowei Wang · Lingling Zhang · Longji Zhu · Tao Qin · Kim-Hui Yap · Xinyu Zhang · Jun Liu

Arch 4A-E Poster #418
[ ]
Thu 20 Jun 10:30 a.m. PDT — noon PDT


Diagram Question Answering (DQA) is a challenging task, requiring models to answer natural language questions based on visual diagram contexts. It serves as a crucial basis for academic tutoring, technical support, and more practical applications. DQA poses significant challenges, such as the demand for domain-specific knowledge and the scarcity of annotated data, which restrict the applicability of large-scale deep models. Previous approaches have explored external knowledge integration through pre-training, but these methods are costly and can be limited by domain disparities. While Large Language Models (LLMs) show promise in question-answering, there is still a gap in how to cooperate and interact with the diagram parsing process. In this paper, we introduce the Chain-of-Guiding Learning Model for Diagram Question Answering (CoG-DQA), a novel framework that effectively addresses DQA challenges. CoG-DQA leverages LLMs to guide diagram parsing tools (DPTs) through the guiding chains, enhancing the precision of diagram parsing while introducing rich background knowledge. Our experimental findings reveal that CoG-DQA surpasses all comparison models in various DQA scenarios, achieving an average accuracy enhancement exceeding 5\% and peaking at 11\% across four datasets. These results underscore CoG-DQA's capacity to advance the field of visual question answering and promote the integration of LLMs into specialized domains.

Live content is unavailable. Log in and register to view live content