Diffusion Probe: Generated Image Result Prediction Using CNN Probes
Abstract
Text-to-image (T2I) diffusion models currently lack an efficient mechanism for early quality assessment, forcing costly random trial-and-error in scenarios requiring multiple generations (e.g., iterating on prompts, agent-based image generation, flow-grpo). To address this, we first reveal a strong correlation between the attention distribution in the early diffusion process and the final image quality. Building upon this insight, we introduce Diffusion Probe, a pioneering framework that leverages the model’s internal cross-attention maps as a predictive signal. We propose a lightweight predictor, trained to establish a direct mapping from statistical properties of these nascent cross-attention distributions—extracted from the initial denoising steps—to the final image’s comprehensive quality. This allows our probe to accurately forecast various aspects of image quality, regardless of the specific ground-truth quality metric, long before full synthesis is complete.We empirically validate the reliability and generalizability of Diffusion Probe through its consistently strong predictive accuracy across a wide spectrum of conditions. On diverse T2I models (e.g., SDXL, FLUX, Qwen-Image), throughout broad early-denoising windows, across various resolutions, and with different quality metrics, it achieves high correlation (PCC > 0.7) and classification performance (AUC-ROC > 0.9). This intrinsic reliability is further demonstrated in practice by successfully optimizing T2I workflows that benefit from early, quality-guided decisions, such as Prompt Optimization, Seed Selection, and Accelerated RL Training. In these applications, the probe's early signal enables more targeted sampling strategies, preempting costly computations on low-potential paths. This yields a dual benefit: a significant reduction in computational overhead and a simultaneous improvement in final outcome quality, establishing Diffusion Probe as a model-agnostic and broadly applicable tool poised to revolutionize T2I efficiency.