VCP-Attack: Visual-Contrastive Projection for Transferable Black-Box Targeted Attacks on Large Vision-Language Models
Jiawei Zhao ⋅ Minjie Du ⋅ Zihan Qin ⋅ Zhuoran Wang ⋅ Lizhe Xie ⋅ Yining HU
Abstract
Large vision-language models (LVLMs) have achieved impressive performance across a variety of multimodal tasks, yet remain vulnerable to targeted adversarial attacks, particularly in black-box settings. In this paper, we propose \textbf{VCP-Attack}, a transferable targeted attack framework that combines structured contrastive supervision with subspace-guided perturbation optimization. Specifically, we employ a dynamic PCA-based projection to constrain perturbations within semantically meaningful low-dimensional subspaces, and design a multi-sample contrastive loss to align adversarial features with target semantics while pushing them away from the source semantics. Extensive experiments on seven open-source and three proprietary LVLMs—including GPT-4o, Claude, and Gemini—show that VCP-Attack achieves \textbf{state-of-the-art} performance in black-box targeted attacks. Under a fixed perturbation budget ($\epsilon = 16/255$), our method achieves an average attack success rate (ASR) of 94.2\% on open-source models and 83.1\% on proprietary models, surpassing the strongest baselines by 23.3\% and 16.8\%, respectively. Notably, VCP-Attack achieves a 95.6\% ASR on GPT-4o. Comprehensive ablation studies and visualizations further validate the effectiveness of the dynamic subspace projection and semantic contrastive supervision. While evaluated on image captioning, our approach is model-agnostic and exhibits strong potential for broader applications to black-box adversarial settings in vision-language tasks.
Successful Page Load