FailureAtlas: Mapping the Failure Landscape of T2I Models via Active Exploration
Muxi Chen ⋅ Zhaohua Zhang ⋅ Chenchen Zhao ⋅ Mingyang Chen ⋅ Wenyu Jiang ⋅ Tianwen Jiang ⋅ Jianhuan Zhuo ⋅ Yu Tang ⋅ Qiuyong Xiao ⋅ Jihong Zhang ⋅ Qiang Xu
Abstract
Static benchmark-driven evaluation has provided a valuable foundation for analyzing Text-to-Image (T2I) models.However, the fixed and predetermined prompt sets in benchmarks inherently limit diagnostic depth, making it difficult to uncover the full landscape of models' systematic failures or isolate their root causes.We argue for a complementary paradigm: $\textbf{active exploration}$, and introduce $\textbf{FailureAtlas}$, the first framework designed to autonomously explore and map the vast failure landscapes of T2I models at scale.Unlike benchmarks that evaluate a fixed prompt set, $\textbf{FailureAtlas}$ performs guided exploration in the input space, framing error discovery as a structured search for minimal, failure-inducing concepts. While this is a computationally explosive problem, we make it tractable with novel acceleration techniques. When applied to Stable Diffusion models, our method uncovers hundreds of thousands of previously unknown error slices (e.g., over 247,000 in SD1.5 alone) and provides the first large-scale evidence linking these failures to data scarcity in the training set. By providing a principled and scalable engine for deep model auditing, $\textbf{FailureAtlas}$ establishes a new, diagnostic-first methodology to guide the development of more robust generative AI.
Successful Page Load