Skip to yearly menu bar Skip to main content


Poster

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Zenghui Yuan · Jiawen Shi · Pan Zhou · Neil Zhenqiang Gong · Lichao Sun

ExHall D Poster #361
[ ] [ Paper PDF ]
Sun 15 Jun 2 p.m. PDT — 4 p.m. PDT

Abstract: Multi-modal large language models (MLLMs) extend large language models (LLMs) to process multi-modal information, enabling them to generate responses to image-text inputs. MLLMs have been incorporated into diverse multi-modal applications, such as autonomous driving and medical diagnosis, via plug-and-play without fine-tuning. This deployment paradigm increases the vulnerability of MLLMs to backdoor attacks. However, existing backdoor attacks against MLLMs achieve limited effectiveness and stealthiness. In this work, we propose $\textit{BadToken}$, the first token-level backdoor attack to MLLMs. BadToken introduces two novel backdoor behaviors: $\textit{Token-substitution}$ and $\textit{Token-addition}$, which enable flexible and stealthy attacks by making token-level modifications to the original output for backdoored inputs. We formulate a general optimization problem that considers the two backdoor behaviors to maximize the attack effectiveness. We evaluate BadToken on two open-source MLLMs and various tasks. Our results show that our attack maintains the model's utility while achieving high attack success rates and stealthiness. We also show the real-world threats of BadToken in two scenarios, i.e., autonomous driving and medical diagnosis. Furthermore, we consider defenses including fine-tuning and input purification. Our results highlight the threat of our attack.

Live content is unavailable. Log in and register to view live content