Streaming Video Crime Anticipation with Spatio-Temporal Causal Reasoning
Abstract
Crime anticipation enables proactive public safety interventions, yet existing video security systems remain largely reactive, unable to detect precursors of crime. While current visual language models (VLM)-based video understanding methods show promise in high-level reasoning, they are not designed to explicitly model the spatio-temporal causal relationships essential for anticipating crimes.We address this limitation by two causal-driven components. First, we develop the Spatio-Temporal Causal Reasoning Crime (STCRC) dataset, a hierarchical dataset comprising 73K samples across five progressive causal reasoning tasks, facilitating criminal precursors learning. Second, we propose the Spatio-Temporal Causal Hypergraph (STCH), a streaming module that transforms implicit entity dynamics into explicit causal structures to enhance causal reasoning for crime in VLMs. By combining these two components, our framework advances real-time crime anticipation, achieving improvements in anticipatory tasks: a 70.7% relative improvement in crime classification, a 10.1% in crime detection, and a 3.7% reduction in temporal prediction error.