Poster
SLADE: Shielding against Dual Exploits in Large Vision-Language Models
Md Zarif Hossain · AHMED IMTEAJ
Large Vision-Language Models (LVLMs) have emerged as transformative tools in multimodal tasks, seamlessly integrating pretrained vision encoders to align visual and textual modalities. Prior works have highlighted the susceptibility of LVLMs to dual exploits (gradient-based and optimization-based jailbreak attacks), which leverage the expanded attack surface introduced by the image modality. Despite advancements in enhancing robustness, existing methods fall short in their ability to defend against dual exploits while preserving fine-grained semantic details and overall semantic coherence under intense adversarial perturbations. To bridge this gap, we introduce SLADE, a novel unsupervised adversarial fine-tuning scheme that enhances the resilience of CLIP-based vision encoders. SLADE’s dual-level contrastive learning approach balances the granular and the holistic, capturing fine-grained image details without losing sight of high-level semantic coherence. Extensive experiments demonstrate that SLADE-equipped LVLMs set a new benchmark for robustness against dual exploits while preserving fine-grained semantic details of perturbed images. Notably, SLADE achieves these results without compromising the core functionalities of LVLMs, such as instruction following, or requiring the computational overhead (e.g., large batch sizes, momentum encoders) commonly associated with traditional contrastive learning methods. The code is provided in the supplementary material with this submission.
Live content is unavailable. Log in and register to view live content