CoRiM: Conflict-driven Risk Minimization for Dynamic Multimodal Fusion
shihao Zou ⋅ Wei Wei
Abstract
Dynamic multimodal fusion methods lack robust theoretical guidance for handling modal conflicts and inconsistent data quality. While recent theory-based works correlate weights with indirect scalar proxies (e.g., loss or confidence), this paradigm struggles to comprehensively capture the risk driven by direct distribution inconsistencies. In this paper, we propose a **Co**nflict-driven **Ri**sk **M**inimization (**CoRiM**) dynamic fusion paradigm. Specifically, we redefine dynamic fusion as a principled, per-sample, direct risk minimization task. To this end, we first design a novel, differentiable Modality Conflict Risk (MCR) function, $\mathcal{R}(w)$, which quantifies risk by directly modeling fused uncertainty and inter-modal consistency. Second, we identify that minimizing $\mathcal{R}(w)$ is fundamentally a non-convex constrained optimization problem over the probabilistic simplex. To efficiently solve this specific challenge, we innovatively introduce the projection free Frank-Wolfe (FW) algorithm, as it is perfectly suited for optimization on the simplex.We prove that our designed $\mathcal{R}(w)$ possesses L-smoothness, which provides theoretical guarantees for the convergence of the FW algorithm on our non-convex objective. Extensive experiments on multiple benchmark datasets demonstrate that CoRiM outperforms current state-of-the-art methods in high-conflict and noisy environments, validating the robustness of our method.
Successful Page Load