GazeShift: Unsupervised Gaze Estimation and Dataset for VR
Abstract
Gaze estimation is instrumental in modern virtual reality (VR) systems. Despite significant progress in remote-camera gaze estimation, VR gaze research remains constrained by data scarcity—particularly the lack of large-scale, accurately labeled datasets captured with the off-axis camera configurations typical of modern headsets. Gaze annotation is difficult since fixation on intended targets cannot be guaranteed. To address these challenges, we introduce VRGaze—the first large-scale off-axis gaze estimation dataset for VR—comprising 2.1 million near-eye infrared images collected from 68 participants. We further propose GazeShift, an attention-guided unsupervised framework for learning gaze representations without labeled data. Unlike prior redirection-based methods that rely on multi-view or 3D geometry, GazeShift is tailored to near-eye infrared imagery, achieving effective gaze–appearance disentanglement in a compact, real-time model. A lightweight few-shot calibration can optionally adapt embeddings to individual users, achieving 1.84° mean error on VRGaze under per-person calibration and 7.15° on MPIIGaze under person-agnostic calibration, with a tenfold reduction in parameters and 5 ms runtime on a VR headset GPU. Quantitative robustness analyses confirm invariance to illumination variations, demonstrating a label-efficient and deployable solution for VR gaze estimation.VRGaze and GazeShift are released under \url{https://github.com/gazeshift3/gazeshift}.