SeD-UD: An Influence-Driven and Hierarchically-Decoupled Information Bottleneck for Multimodal Intent Recognition
Abstract
Multimodal intent recognition (MIR) is hindered by substantial redundancy and noise originating from text, speech, and visual inputs, which weakens feature distinctiveness and ultimately harms recognition performance. Although recent approaches based on the information bottleneck (IB) principle mitigate this issue via feature compression and reconstruction to obtain compact and noise-reduced representations, they still encounter two major drawbacks. First, conventional IB employs a fixed bottleneck dimension, making it unable to accommodate sample-dependent variations in redundancy and noise. Second, simultaneously handling redundancy and noise within a single compression process leads to incomplete feature purification. In this paper, we propose a novel framework named SeD-UD, which incorporates influence-driven input-adaptive bottleneck (IDAB) modules following a hierarchically-decoupled IB strategy. Given a redundancy/noise influence factor, IDAB dynamically adjusts dimensions and selects the optimal parameters for compression and reconstruction, thereby achieving the best trade-off between information preservation and interference suppression. The IB strategy performs hierarchically-decoupled processing of redundancy and noise via separated de-redundancy and unified denoising based on IDAB modules. Extensive experiments on benchmark datasets show SeD-UD outperforms current state-of-the-art models.