LayoutAD: Exploring Semantic-Geometric Misalignment Reasoning for Scene Layout Anomaly Detection
Abstract
Visual anomaly detection is vital for quality control applications by identifying deviations from normal patterns.Previous structural or logical anomaly detection methods mainly focus on pixel-level deviations like texture defects and reconstruction errors, ignoring the object-level structural and contextual inconsistencies.These overlooked layout anomalies remain critical yet underexplored, e.g., factually defective hallucinations appeared in generative text-to-image models.Based on the above observation, in this paper, we introduce scene layout anomaly detection, a new task that predicts an object-level anomaly map from the input image to reveal the semantic plausibility and geometric consistency of each object in the scene.Specifically, we propose \textit{LayoutAD}, an unsupervised learning framework that constructs semantic and geometric graphs to jointly reason over semantic-geometric misalignment among objects.Under this formulation, we are able to detect diverse layout deviations, including object attribute implausibilities and relationship mismatches.Extensive experiments show that \textit{LayoutAD} outperforms baselines qualitatively and quantitatively across scenarios, benefiting scene understanding and generation applications, including self-corrected image generation and video anomaly detection.