Skip to yearly menu bar Skip to main content


Poster

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

Jiaben Chen · Huaizu Jiang


Abstract: Human-centric video frame interpolation has great potential for enhancing entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available for video frame interpolation in the community, none of them is dedicated to human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark featuring over 130K high-resolution ($\geq$720p) slow-motion sports video clips, totaling over 1M video frames, sourced from YouTube. We re-train several state-of-the-art methods on our benchmark, and we observed a noticeable decrease in their accuracy compared to other datasets. This highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To tackle these challenges, we propose human-aware loss terms, where we add auxiliary supervision for human segmentation in panoptic settings and keypoints detection. These loss terms are model-agnostic and can be easily plugged into any video frame interpolation approach. Experimental results validate the effectiveness of our proposed human-aware loss terms, leading to consistent performance improvement over existing models. The dataset and code will be publicly released to foster future research.

Live content is unavailable. Log in and register to view live content