RunawayEvil: Jailbreaking the Image-to-Video Generative Models
Abstract
Image-to-Video (I2V) generation represents a frontier in content creation, where models synthesize dynamic visual sequences by jointly reasoning from both image and text prompts. This multimodal grounding enables diverse controllability over video attributes. However, it is precisely this capability that introduces a critical security blind spot: by exploiting the interplay between visual and textual cues, attackers can launch multimodal jailbreak attacks that severely compromise output security. Despite the increasing implementation of security mechanisms in real-world I2V systems, such cross-modal threats remain unexplored. Existing attack methods remain confined to single-modal settings, relying solely on isolated text or image perturbations, which severely limits their effectiveness. To bridge this gap, we propose Runaway Evil, the first multimodal jailbreaking framework for I2V models with dynamic evolutionary capability. Built on a Strategy-Tactic-Action paradigm, our framework exhibits self-amplifying attack through three core components: (1) a strategy-aware command unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and large language model (LLM)-based strategy exploration; (2) a multimodal tactical planning unit that generates synergistic text jailbreak instructions and image tampering guidelines based on the selected strategies; and (3) an tactical action Unit executes and evaluates the coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate that Runaway Evil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. This work provides a critical tool for probing and mitigating multimodal vulnerabilities, laying a foundation for building more robust video generation systems.