CLEX: Complementary Label Exchange Learning for Noisy Facial Expression Recognition
Lin Wang ⋅ Fang Liu ⋅ Xiaofen Xing ⋅ Kailing Guo ⋅ Xiangmin Xu
Abstract
Facial expression recognition (FER) in the wild is severely hampered by label noise and annotation ambiguity. Existing methods, including sample selection, label ensembling, and consistency regularization, primarily rely on ordinary label supervision and offer limited control over non-target predictions, leading to spurious activations and overfitting to noisy labels. To address this limitation, we propose a novel learning framework, named Complementary Label Exchange Learning (CLEX), enhances robustness by exchanging knowledge from non-target predictions across augmented views. Specifically, CLEX comprises three synergistic components. First, Stochastic Non-Target Logit Exchange randomly swaps a subset of non-target logits between original and augmented views to couple error-prone predictions, creating robust consistency constraints. Second, Scale-Invariant Logit Normalization eliminates magnitude artifacts through $L_p$-norm normalization, ensuring that regularization operates over geometrically meaningful directions rather than being dominated by arbitrary scales. Third, Complementary Suppression Loss selectively penalizes spurious activations over a randomly retained subset of non-target classes, avoiding the uniform shrinkage that hampers discriminative learning. To further stabilize training, we incorporate attention consistency regularization that enforces spatial alignment between augmented views, while retaining auxiliary cross-entropy to preserve semantic localization capability. Extensive experiments across multiple benchmark FER datasets (RAF-DB, FERPlus, and AffectNet) demonstrate that CLEX consistently outperforms existing robust FER learning approaches.
Successful Page Load