Hierarchical Attacks for Multi‑Modal Multi‑Agent Reasoning
Abstract
Multi‑modal multi‑agent systems (MM‑MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important.However, existing studies on adversarial attacks in multi‑agent systems primarily focus on isolated agents or unimodal settings, leaving the vulnerabilities of MM‑MAS largely underexplored. To bridge this gap, we introduce HAM\textsuperscript{3}, a Hierarchical Attack framework for multi-modal multi-agent systems that decomposes attacks into three interconnected layers. Specifically, at the perception layer, HAM\textsuperscript{3}mounts attacks by perturbing visual inputs, textual inputs, and their fused visual–textual representations. At the communication layer, it performs communication-level attacks that corrupt message content and interaction topology, such as manipulating shared context or communication links to distort collective information flow. At the reasoning layer, it conducts reasoning-level attacks that interfere with each agent’s cognitive pipeline, biasing reasoning trajectories and ultimately compromising final decisions. We evaluate HAM\textsuperscript{3} on the GQA benchmark through multi‑agent systems built on distinct reasoning paradigms including ReAct, Plan‑and‑Solve, and Reflexion. Experiments demonstrate that our framework achieves an Attack Success Rate of up to 78.3\%, with reasoning‑layer attacks being the most effective. More than half of the successful attacks lead multiple agents to produce consistent errors. These findings offer valuable insights for building more robust and interpretable multi‑agent intelligence.