Skip to yearly menu bar Skip to main content


Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Menghao Zhang · Jingyu Wang · Qi Qi · Haifeng Sun · Zirui Zhuang · Pengfei Ren · Ruilong Ma · Jianxin Liao

Arch 4A-E Poster #270
award Highlight
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Recent progress in video anomaly detection suggests that the features of appearance and motion play crucial roles in distinguishing abnormal patterns from normal ones. However, we note that the effect of spatial scales of anomalies is ignored. The fact that many abnormal events occur in limited localized regions and severe background noise interferes with the learning of anomalous changes. Meanwhile, most existing methods are limited by coarse-grained modeling approaches, which are inadequate for learning highly discriminative features to discriminate subtle differences between small-scale anomalies and normal patterns. To this end, this paper address multi-scale video anomaly detection by multi-grained spatio-temporal representation learning. We utilize video continuity to design three proxy tasks to perform feature learning at both coarse-grained and fine-grained levels, i.e., continuity judgment, discontinuity localization, and missing frame estimation. In particular, we formulate missing frame estimation as a contrastive learning task in feature space instead of a reconstruction task in RGB space to learn highly discriminative features. Experiments show that our proposed method outperforms state-of-the-art methods on four datasets, especially in scenes with small-scale anomalies.

Live content is unavailable. Log in and register to view live content