Joint Learning of General and Diverse Patterns with Mixture of Memory Experts for Weakly-Supervised Video Anomaly Detection
Abstract
Weakly-supervised Video Anomaly Detection (wVAD) aims to detect abnormal events using only binary labels, making it challenging to capture both the diversity of anomalies and their shared semantic cues. Existing methods either focus on a generic anomaly pattern, achieving strong generalization but weak discrimination, or rely on class-level diversity modeling, which ignores shared semantics and suffers from limited generalization. To overcome these limitations, we propose the Mixture of Memory Experts (MoME), a unified framework that jointly learns general and diverse patterns. Each expert in MoME possesses an internal memory for fine-grained specialization and shares an external memory for general knowledge aggregation. To enhance semantic diversity and improve generalization beyond coarse class-level supervision, we introduce an Anomaly Prototype Router that leverages large language models to construct generalized anomaly prototypes for semantically guided expert routing. Moreover, the regularization loss for APR ensures balanced routing, the distinctiveness loss for experts encourages diversity, and reconstruction together with memory tasks enhance pattern discriminability. Extensive experiments on UCF-Crime and XD-Violence demonstrate that our approach achieves state-of-the-art performance, validating the effectiveness of jointly modeling generality and diversity for robust anomaly detection under weak supervision.