Tackling Model Bias via Game-theoretic Multi-agent Collaboration Framework for Hateful Meme Classification
Abstract
Hateful meme classification aims to identify memes containing hateful content and has become increasingly important in the era of social media dominance. Large multimodal models (LMMs) have significantly enhanced the understanding of multimodal content, advancing this field. However, cognitive biases in LMMs can impede effective collaboration among models. To address this issue, we introduce \textbf{GECO}, a \textbf{G}ame-theoretic multi-ag\textbf{E}nt \textbf{C}ollaboration framew\textbf{O}rk that organizes multiple LMMs into interacting agents and employs game-theoretic principles to guide them toward an optimal cooperative equilibrium. GECO integrates a mixed bonus scheme, incorporating both individual accuracy and cross-model agreement to promote convergence toward a consistent cooperative solution. In addition, we implement efficient policy learning and introduce a penalty coefficient to optimize the framework effectively and ensure training stability. Extensive experiments on five publicly available benchmarks demonstrate that our framework achieves new state-of-the-art performance.