SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks
Yimeng Shan ⋅ Zhenbang Ren ⋅ Haodi Wu ⋅ Wenjie Wei ⋅ Rui-Jie Zhu ⋅ Shuai Wang ⋅ Dehao Zhang ⋅ Yichen Xiao ⋅ Jieyuan Zhang ⋅ Kexin Shi ⋅ Jingzhinan Wang ⋅ Jason Eshraghian ⋅ Haicheng Qu ⋅ Malu Zhang
Abstract
Event cameras provide superior temporal resolution, dynamic range, energy efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches combining Artificial Neural Networks (ANNs) and SNNs suffer from suboptimal architectures that compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based \textbf{S}pike-\textbf{D}riven \textbf{T}racking (SDTrack) pipeline. It incorporates a novel event frame aggregation method called Global Trajectory Prompt (GTP) and a Transformer-based tracker. The GTP method effectively captures global trajectory information and aggregates it with event streams into event frames to enhance spatiotemporal representation. The Transformer-based tracker comprises a fully spike-driven SNN backbone and a simple tracking head. The SDTrack pipeline operates end-to-end without data augmentation or post-processing. Extensive experiments demonstrate that our SDTrack-Tiny pipeline achieves competitive accuracy with only 19.61$M$ parameters and 8.16$mJ$ energy consumption, while our Base version achieves state-of-the-art accuracy across three datasets. Our work establishes a solid foundation for future neuromorphic vision research.
Successful Page Load