Through the Frequency Lens: Cross-Domain Generalisable Gaze Estimation with Adaptive Modulation
Abstract
Deep learning-based gaze estimation methods often exhibit significant performance degradation on unseen target domains. Through systematic frequency-domain analysis, we reveal that face images contain frequency components with distinct contributions: some facilitate cross-domain generalization while others introduce domain-specific interference that impedes it, with both components varying across datasets and constituting a key source of domain gap. Based on these observations, we propose the Frequency-Guided Adaptive Learning framework (FGAL), a novel framework enhancing domain generalization without accessing target domain data. The FGAL consists of two complementary modules: the Adaptive Interference Suppression Module (AISM) and the Spectrum Diversification Module (SDM). AISM adaptively suppresses sample-specific interfering frequency components through learnable modulation maps, while SDM diversifies frequency distribution patterns to enhance robustness against cross-domain variations. Experiments demonstrate that FGAL achieves substantial improvements, outperforming baselines by up to 28.2\% and state-of-the-art methods by up to 19.5\% across multiple cross-domain settings, demonstrating our framework's potential for broader domain generalization tasks.