Hidden Dangers of Compositional Generation: Diagnosing Semantic Safety Failures in Text-to-Image Models
Abstract
\begin{abstract}Text-to-Image (\textbf{T2I}) models have achieved significant progress in generating high-quality images, with compositional visual generation emerging as an important capability that enables them to synthesize coherent, natural scenes from multiple discrete concepts. However, this powerful compositionality, while enhancing creativity, also introduces new safety risks: combinations of different concepts can produce high-risk images without explicitly expressing harmful content. Motivated by this, we propose \textbf{CoRA} (Composable Reassembly Attack): an attack method that preserves the original semantics while bypassing safety filters. Unlike traditional compositional generation approaches that rely on modifying the sampling process, \textbf{CoRA} operates solely in the text space under a black-box setting, iteratively rewriting and guiding prompts through interactive steps. Specifically, \textbf{CoRA} decomposes a potentially harmful intent into a set of fine-grained, superficially benign but semantically complete visual elements, and then uses iterative selection and reassembly to guide the target \textbf{T2I} model to recombine these elements without triggering safety checks, thereby recovering the original malicious semantics. Experimental results show that \textbf{CoRA} significantly improves attack success rates (\textbf{ASR}) across several mainstream open-source and commercial \textbf{T2I} models, producing higher-risk outputs while maintaining semantic consistency.\textcolor{red}{\textbf{Warning:} This paper contains model-generated content that may be considered offensive or disturbing.}