Complementary Prototype Mapping for Efficient Multimodal Anomaly Detection
Abstract
Multimodal unsupervised anomaly detection has garnered increasing attention for robust defect localization.Recent approaches rely on establishing cross-modal matching relationships under normal conditions without explicit guidance.However, in practice, a single modality may have multiple distinct representations corresponding to another modality, and such unconditional mappings struggle to adaptively capture these variations, resulting in mapping ambiguity and the misclassification of diverse yet normal variations as anomalies.Moreover, existing methods suffer from slow inference speed and high memory overhead, hindering their deployment in real-world production lines.To address these issues, we propose an efficient and effective Complementary Prototype Mapping (\textbf{CPMAD}) framework, which dynamically extracts consensus and supplementary prototypes to serve as complementary priors, thereby guiding and disambiguating cross-modal mappings.The framework comprises three key components:(1) Consensus Extraction Module (CEM) learns a dynamic anchor, transforming multimodal features into anomaly-free consensus prototypes to improve cross-modal consistency and suppress latent anomalies;(2) Supplementary Query Module (SQM) employs a Complementary Residual Attention mechanism to capture the discrepancy between the consensus and modality-specific spaces, thereby exploring the most representative and discriminative cues as supplementary prototypes; and(3) Complementary Mapping Module adaptively integrates both prototypes to perform feature mapping.Extensive experiments demonstrate that CPMAD not only achieves superior performance in both full-data and few-shot settings across diverse industrial and medical scenarios but also maintains faster inference speeds and lower memory consumption compared to existing methods.The code will be released upon publication.