Poster
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
Kai Mao · Ping Wei · Yiyang Lian · Yangyang Wang · Nanning Zheng
Zero-shot and few-shot anomaly detection have recently garnered widespread attention due to their potential applications. Most current methods require an auxiliary dataset in the same modality as the test images for training, making them ineffective in cross-modal situations. However, cross-modal anomaly detection is an essential task for both its application and research value. Existing methods usually underperform under this cross-modal anomaly detection task. Instead, we propose a framework that can be trained using data from a variety of pre-existing modalities that generalizes well to unseen modalities. The model consists of (1) the Transferable Visual Prototype, which directly learns normal/abnormal semantics in the visual space; (2) a Prototype Harmonization strategy that adaptively utilizes the Transferable Visual Prototypes from various modalities for inference on the unknown modality; (3) a Visual Discrepancy Inference for inference under the few-shot setting, further enhancing performance. In the zero-shot setting, the proposed method achieves AUROC improvements of 4.1\%, 6.1\%, 7.6\%, and 6.8\% over the best competing methods in the RGB, 3D, MRI/CT, and Thermal modalities, respectively. In the few-shot setting, our model also achieves the highest AUROC/AP on ten datasets in four modalities, substantially outperforming existing methods.
Live content is unavailable. Log in and register to view live content