Spk2VidNet: A Hierarchical Recurrent Architecture for High-Fidelity Video Reconstruction from Long Spike-Camera Streams
Abstract
Spike camera is a neuromorphic vision sensor with ultra-high temporal resolution, capable of capturing fast-moving scenes by firing a stream of binary spikes. However, its relatively low spatial resolution limits the acquisition of fine-grained visual details, motivating research on spike camera super resolution (SCSR). Existing SCSR methods typically operate on fixed-length spike sequences, where the accessible information is confined to a local temporal neighborhood. Moreover, spike fluctuations hinder intensity information extraction. Both factors affect the performance of SCSR. To address these issues, we propose a hierarchical recurrent network named Spk2VidNet to reconstruct high-fidelity high resolution image sequences from low resolution spike data. To mitigate fluctuations, Spk2VidNet progressively exploits temporal correlations within spike stream to enhance feature representation by hierarchically enlarging temporal receptive fields. Within recurrent phase, we introduce an alignment module that leverages the motion consistency among multiple frames to jointly estimate and mutually refine inter-frame motions, achieving more accurate temporal alignment. In addition, we propose a fusion module to adaptively integrate neighboring aligned features based on multi-scale similarity for robust feature aggregation. We further propose a segment-wise training with state transfer strategy to efficiently model long-term dependencies with limited GPU memory, thereby leveraging rich subpixel cues for improved super resolution. Experiments on synthetic and real-captured spike data demonstrate that Spk2VidNet achieves state-of-the-art performance.