Render-to-Adapt: Unsupervised Personal Adaptation for Gaze Estimation
Abstract
Deep learning-based gaze estimation methods tend to suffer from substantial performance drop in real-world scenarios with varying users and environments. To tackle this issue, most recent approaches employ Unsupervised Domain Adaptation (UDA) to bridge the gap between source and target domains. However, this paradigm is misaligned with real-world scenarios, where the system typically needs to adapt to only a single new user. Therefore, this paper advocates a more practical paradigm: Unsupervised Personal Adaptation (UPA), which calibrates a pre-trained model using a few unlabeled images from a single new user. Conventional UDA methods do not guarantee improvements for every user and often yield lower average performance in this setting. To address this problem, we propose Render-to-Adapt (R2A), a self-supervised framework specifically designed for the UPA task. Given a pretrained gaze model, R2A utilizes a gaze-conditioned renderer to synthesize new images based on the model's gaze predictions, and enforces eye-region consistency as a label-free signal to enhance personalized gaze estimation. We evaluate R2A on a re-designed cross-dataset personal adaptation benchmark. Experimental results show that R2A consistently improves performance across all individuals and significantly outperforms existing SOTA methods.