Transform to Transfer: Boosting Adversarial Attack Transferability on Vision-Language Pre-training Models
Abstract
Visual-Language Pre-training (VLP) models, while achieving state-of-the-art performance on various multimodal tasks, exhibit significant vulnerability to multimodal adversarial examples. In black-box attack scenarios of VLP models, a key challenge lies in the limited transferability of these adversarial examples. Existing methods to enhance transferability often suffer from an excessive dependence on the source model and a reliance on limited and fixed transformation techniques. To overcome these limitations, We propose a novel Transform to Transfer Attack (TTA) method. Our approach introduces a learnable transformation mechanism that adaptively selects optimal combinations of transformations to maximize input diversity, and incorporates integrated gradients to mitigate over-fitting on the source model, thereby refining the attack optimization process. Extensive experiments demonstrate that TTA achieves outstanding attack performance in downstream tasks, outperforming current state-of-the-art attack methods across different VLP architectures.