Skip to yearly menu bar Skip to main content


Poster

Seek Common Ground While Reserving Differences: Semi-supervised Image-Text Sentiment Recognition

Wuyou Xia · Guoli Jia · Sicheng Zhao · Jufeng Yang


Abstract:

Multimodal sentiment analysis has attracted extensive research attention as increasing users share images and texts to express their emotions and opinions on social media. Collecting large amounts of labeled sentiment data is an expensive and challenging task due to the high cost of labeling and unavoidable label ambiguity. Semi-supervised learning (SSL) is explored to utilize the extensive unlabeled data to alleviate the demand for annotation. However, different from typical multimodal tasks, the inconsistent sentiment between image and text leads to the sub-optimal performance of SSL algorithms. To address this issue, we propose SCDR, the first semi-supervised image-text sentiment recognition framework. To better utilize the discriminative features of each modality, we decouple features into common and private parts and then use the private features to train unimodal classifiers for enhanced modality-specific sentiment representation. Considering the complex relation between modalities, we devise a modal selection-based attention module that adaptively assesses the dominant sentiment modality at the sample level to guide the fusion of multimodal representations. Furthermore, to prevent the model predictions from overly relying on common features under the guidance of multimodal labels, we design a pseudo-label filtering strategy based on the matching degree of prediction and dominant modality. Extensive experiments and comparisons on five publicly available datasets demonstrate that SCDR outperforms state-of-the-art methods. The code is provided in the supplementary material and will be released to the public.

Live content is unavailable. Log in and register to view live content