SoccerMaster: A Vision Foundation Model for Soccer Understanding
Abstract
Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges.However, prior works typically rely on task-specific expert models, which are resource-intensive and hinder a holistic view of the game.This paper aims to propose a unified framework that enables a single model to handle diverse soccer visual understanding tasks, spanning both fine-grained perception (e.g., athlete detection) and semantic reasoning (e.g., event classification).Concretely, we make the following contributions in this paper:(i) we present SoccerMaster, the first soccer-specific vision foundation model that unifies comprehensive understanding tasks within a single framework via supervised multi-task pretraining;(ii) we consolidate multiple existing soccer video datasets and develop an automated data curation pipeline, termed as SoccerFactory, to produce scalable multi-task training annotations;and (iii) we conduct extensive experiments demonstrating that SoccerMaster consistently outperforms task-specific expert models across diverse downstream tasks, underscoring its breadth and superiority.The data, code, and model will be publicly available to the research community.