Skip to yearly menu bar Skip to main content


FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding

Jinglin Xu · Guohao Zhao · Sibo Yin · Wenhao Zhou · Yuxin Peng

Arch 4A-E Poster #216
[ ]
Fri 21 Jun 10:30 a.m. PDT — noon PDT


Fine-grained action analysis in multi-person sports is complex due to athletes' quick movements and intense physical confrontations, which result in severe visual obstructions in most scenes. In addition, accessible multi-person sports video datasets lack fine-grained action annotations in both space and time, adding to the difficulty in fine-grained action analysis.To this end, we construct a new multi-person basketball sports video dataset named FineSports, which contains fine-grained semantic and spatial-temporal annotations on 10,000 NBA game videos, covering 52 fine-grained action types, 16,000 action instances, and 123,000 spatial-temporal bounding boxes. We also propose a new prompt-driven spatial-temporal action location approach called PoSTAL, composed of a prompt-driven target action encoder (PTA) and an action tube-specific detector (ATD) to directly generate target action tubes with fine-grained action types without any off-line proposal generation. Extensive experiments on the FineSports dataset demonstrate that PoSTAL outperforms state-of-the-art methods. Data and code are available at

Live content is unavailable. Log in and register to view live content