Poster Sat, Jun 6, 2026 • 10:45 AM – 12:45 PM PDT ExHall F

FPSBench: A Benchmark for Video Understanding at High Frame Rates

Rohan Choudhury ⋅ Jean Dandurand ⋅ Kai Qiu ⋅ Kshitij Madhav Bhat ⋅ Kartik Sharma ⋅ Liza Dahiya ⋅ Yizhou Zhao ⋅ Souraja Kundu ⋅ Chun-Hsien Lin ⋅ Kris Kitani ⋅ László A. Jeni

Abstract

Modern video-language models are typically trained on videos downsampled to low frames-per-second (FPS), and the most commonly used evaluation benchmarks are designed for low-FPS input as well. To address this shortcoming, we present FPS-Bench, a large video question-answering benchmark designed to evaluate VLMs’ capabilities to understand video at high-frame rates. We introduce a new metric, the minimum frames-per-second (minFPS), which measures the minimum frame-rate required to solve a given question. While existing benchmarks require <1 minFPS, we rigorously curate more than 1000 questions from a diverse source of videos and manually verify minFPS for each example, leading to a benchmark that requires watching videos at on average 7 FPS to solve. Our evaluation of several state-of-the-art VLMs shows that they are severely lacking, achieving QA accuracy of 30\% in the FPS-Bench multiple-choice task, while humans achieve 72\% accuracy. We believe that FPS-Bench will serve as a valuable tool for improving frontier-level VLMs and will release all data and code.