PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
Abstract
Video dataset condensation aims to mitigate the immense computational cost of video processing, but faces the unique challenge of preserving the complex interplay between spatial content and temporal dynamics. Prior work often unnaturally disentangles these elements, overlooking their essential interdependence. We introduce Progressive Refinement and Insertion for Sparse Motion (PRISM), a novel approach that preserves this critical coupling. PRISM begins with a minimal set of key frames and dynamically synthesizes new ones by identifying moments of high motion complexity, where simple interpolation fails, through gradient misalignments. This adaptive process allocates new frames only where such complexity exists, creating highly efficient and temporally coherent synthetic datasets. Extensive experiments show PRISM achieves highly competitive performance on standard action recognition benchmarks, often matching or exceeding prior methods, while creating powerful representations with significantly less storage