Omni-Attack: Adversarial Attacks on Open-Ended VQA in Black-Box Multimodal LLMs
Kai Hu ⋅ Weichen Yu ⋅ Li Zhang ⋅ Alexander Robey ⋅ Andy Zou ⋅ Haoqi Hu ⋅ Chengming Xu ⋅ Matt Fredrikson
Abstract
Multimodal large language models (MLLMs) have achieved remarkable success across diverse applications, from autonomous driving to document understanding. As these models are deployed in safety-critical contexts, understanding their adversarial robustness becomes crucial. However, current evaluations focus primarily on simple tasks like coarse-grained classification, and employ inconsistent evaluation protocols, hindering rigorous comparison of attack methods. We introduce AdvRobustBench, a comprehensive adversarial robustness benchmark for MLLMs comprising 1,000 examples across visual question answering (VQA) and optical character recognition (OCR) tasks, drawn from widely-used MLLM benchmarks (MMBench, MMStar, OCRBench-v2). We further propose Omni-Attack, a novel transfer-based black-box attack method that addresses key challenges in attacking open-ended question-answering systems. Our approach introduces (i) a target-construction pipeline that generates question-conditioned textual and visual targets to provide stronger optimization signals, and (ii) a location-aware attack strategy for OCR that enables spatially-precise perturbations. Extensive experiments demonstrate that Omni-Attack achieves strong targeted attack success rates (up to 71.8\% on GPT-4.1 at $\varepsilon=8/255$) across both proprietary models (GPT-4.1, Claude 3.7, Gemini 2.0) and open-source MLLMs, revealing significant vulnerabilities in current multimodal systems. Our benchmark and findings establish a foundation for developing more robust MLLMs.
Successful Page Load