DetectSCI: Toward Object-Guided ROI Reconstruction for High-Resolution Video Snapshot Compressive Imaging
Abstract
Video snapshot compressive imaging (SCI) offers a promising alternative to high-speed cameras by encoding multiple frames into a single 2D measurement. However, SCI requires algorithms to reconstruct the high-speed video and as resolution increases, reconstruction becomes computationally expensive and memory-intensive. Much of resource is wasted on recovering large background regions that contain little useful information, highlighting the need for selective, object-driven reconstruction. Existing object detectors struggle to perform accurately on SCI measurements due to the spatial–temporal aliasing introduced by coded exposure. To address this challenge, we proposes DetectSCI, the first framework enabling object-guided region-of-interest (ROI) reconstruction for high-resolution SCI. The inside detector comprises two key components: an encoder built from weight-sharing Mamba-Implicit Modules (MIM) for progressive feature refinement, and a Frequency Mamba (FM) module dedicated to frequency-aware query selection. MIM enhances features via multi-scale dilated convolutions and implicit representations, while FM restores discriminative details by decomposing and reweighting frequency bands. Experiments on the SportsMOT dataset show that DetectSCI achieves 80.9 Average Precision (AP), surpassing the best CNN-based detector by at least 2.8 AP and the best Transformer-based detector by at least 4.1 AP, while maintaining comparable efficiency. Code will be released.