D^3FER: Dual Channel and Dual Branch Network for Robust Facial Expression Recognition under Dual Challenges
Hui Tang ⋅ Yifan He ⋅ Zhong Jin
Abstract
Facial expression recognition (FER) in the wild remains a challenging task due to the coexistence of data noise and label noise. While existing methods often address one type of noise in isolation, they struggle to achieve robust performance under the compound effects of both. To this end, we propose D$^3$FER ($\textbf{D}$ual Channel and $\textbf{D}$ual Branch Network for Robust Facial Expression Recognition under $\textbf{D}$ual Noise), a unified framework that simultaneously tackles data and label noise in a single architecture. D$^3$FER introduces a dual-channel augmentation strategy, pairing weakly and strongly augmented views, to facilitate reliable pseudo-label generation and noise-aware training. Coupled with a dynamic queue mechanism, it adaptively estimates a noise threshold based on historical prediction confidence, enabling automatic identification and correction of label noise. Furthermore, inspired by contrastive learning, we design a momentum-updated Query-Key dual-branch structure that enhances intra-class compactness and inter-class separability, thereby improving robustness to data noise. At inference time, the stable Key branch parameters are leveraged to ensure consistent and generalized predictions. Extensive experiments on major in-the-wild benchmarks demonstrate that D$^3$FER outperforms state-of-the-art methods, setting new records in both accuracy and robustness under realistic, noisy conditions. The source code is available at https://github.com/D3FER/D3FER.
Successful Page Load