Skip to yearly menu bar Skip to main content


Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang · Lei-lei Li · Junfei Zhou · Junbin Xiao · Hongkai Yu · Chen Lv · Jianru Xue · Tat-seng Chua

Arch 4A-E Poster #239
award Highlight
[ ] [ Project Page ]
Fri 21 Jun 10:30 a.m. PDT — noon PDT


We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with tempo- rally aligned text descriptions. We annotate over 2.23 mil- lion object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU sup- ports various accident understanding tasks, particularly multimodal video diffusion to understand accident cause- effect chains for safe driving. With MM-AU, we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD per- forms video diffusion via an Object-Centric Video Diffu- sion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories. OAVD enforces the object region learning while fixing the content of the original frame background in video generation, to find the dominant objects for certain acci- dents. Extensive experiments verify the abductive ability of AdVersa-SD and the superiority of OAVD against the state- of-the-art diffusion models. Additionally, we provide care- ful benchmark evaluations for object detection and accident reason answering since AdVersa-SD relies on precise object and accident reason information.

Live content is unavailable. Log in and register to view live content