Exposing and Evaluating Hallucinations for GUI Grounding
Abstract
Existing GUI benchmarks primarily focus on evaluating models’ comprehensive capabilities but largely overlook hallucination phenomena in grounding tasks, which are crucial to the reliability of GUI understanding. In this work, we expose two major types of hallucinations in GUI grounding: 1) Confusion Hallucination, where distractor elements are mistakenly selected, and 2) Fabricated Hallucination, where nonexistent elements are hallucinated with plausible coordinates. To systematically investigate their origins, we introduce GUI-HalluBench, a benchmark comprising two complementary subsets: a parsing subset for assessing structural representation of GUI elements and a hallucination subset for measuring grounding robustness under challenging conditions. This design allows us to associate hallucination patterns with deficiencies in prerequisite abilities: parsing errors are closely tied to both fabricated and confusion hallucinations. Experiments on state-of-the-art models confirm these connections, offering new insights into the root causes of hallucinations and guiding the development of more reliable GUI understanding tools.