DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
Yanbin Wei ⋅ Jiangyue Yan ⋅ Chun Kang ⋅ Yang Chen ⋅ Hua Liu ⋅ James Kwok ⋅ Yu Zhang
Abstract
Vision-Language Models (VLMs) have emerged as versatile solutions for zero-shot question answering (QA) across various domains. However, enabling VLMs to effectively comprehend structured graphs and perform accurate, efficient QA remains challenging. Existing approaches typically rely on a single type of graph topology representation (GTR) of graphs, such as fixed-style visual images or unified text descriptions. This "one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or overly lengthy responses to graph-related queries. To address this, we propose the $\mbox{DynamicGTR}$ framework, which dynamically selects the optimal GTR for each query during inference, thereby enhancing the zero-shot graph QA capabilities of VLMs with a customizable accuracy and brevity trade-off. Extensive experiments show that DynamicGTR not only improves VLM-based graph algorithm QA performance but also successfully transfers the experience trained from synthetic graph algorithm tasks to real-world applications like link prediction and node classification, without any additional training. Additionally, DynamicGTR demonstrates strong transferability across tasks, domains, and models, suggesting its potential as a flexible solution for broad graph scenarios.
Successful Page Load