Paper
in
Workshop: 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild
Enhancing Continuous Emotion Recognition through CLIP Fine-Tuning and Sequential Learning
Weiwei Zhou · Chenkun Ling · Zefeng Cai
Human emotion recognition is essential for seamless human-computer interaction. This paper presents a novel approach to tackling the Valence-Arousal (VA) Estimation Challenge, the Expression Recognition Challenge, and the Action Unit (AU) Detection Challenge within the 8th ABAW competition framework. We propose a framework that enhances continuous emo008 tion recognition by fine-tuning the CLIP model with the Aff009 Wild2 dataset, leveraging annotated expression labels. The fine-tuned CLIP model serves as a robust visual feature ex011 tractor. Additionally, we integrate Temporal Convolutional Network (TCN) modules and Transformer Encoder modules to improve sequential learning. Our method significantly outperforms baseline models, achieving 3rd place in the VA Estimation Challenge and 2nd place in both the Expression Recognition Challenge and the AU Detection Challenge