Poster
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing · Hao Chen · Binzhu Xie · Jiaqi Xu · Ziyu Guo · Xuemiao Xu · Jianye Hao · Chi-Wing Fu · Xiaowei Hu · Pheng-Ann Heng
Traffic Anomaly Understanding (TAU) is essential for improving public safety and transportation efficiency by enabling timely detection and response to incidents. Beyond existing methods, which rely largely on visual data, we propose to consider audio cues, a valuable source that offers strong hints to anomaly scenarios such as crashes and honking. Our contributions are twofold. First, we compile AV-TAU, the first large-scale audio-visual dataset for TAU, providing 29,865 traffic anomaly videos and 149,325 Q&A pairs, while supporting five essential TAU tasks. Second, we develop EchoTraffic, a multimodal LLM that integrates audio and visual data for TAU, through our audio-insight frame selector and dynamic connector to effectively extract crucial audio cues for anomaly understanding with a two-phase training framework. Experimental results on AV-TAU manifest that EchoTraffic sets a new SOTA performance in TAU, outperforming the existing multimodal LLMs. Our contributions, including AV-TAU and EchoTraffic, pave a new direction for multimodal TAU. Our dataset and code will be publicly available upon publication of this work.
Live content is unavailable. Log in and register to view live content