Beyond Scanpaths: Graph-Based Gaze Simulation in Dynamic Scenes
Abstract
Accurately modelling human attention is essential for numerous computer vision applications, particularly in the domain of automotive safety. Existing methods often collapse gaze into scanpaths or saliency maps, overlooking the dynamics of natural eye movements and introducing artefacts into training data. We instead propose a dynamical systems approach that treats gaze as an active agent interacting with its environment, enabling the simulation of raw, continuous gaze trajectories. In our approach, driving scenes are represented as gaze-centric spatiotemporal graphs processed by the Affinity Relation Transformer (ART), a heterogeneous graph transformer that models interactions between driver gaze and surrounding traffic objects and road structure. We further introduce an Object Density Network (ODN) to predict next-step gaze distributions, capturing the stochastic, object-centric nature of attentional shifts in complex environments. To support this research, we also present Focus100, a new dataset of gaze recordings from 30 participants viewing ego-centric driving footage. Trained directly on raw gaze, without any fixation filtering, our unified approach produces more natural gaze timeseries, scanpath dynamics, and saliency maps than existing attention estimation methods, offering valuable insights for the temporal modelling of human attention and automotive safety.