MDS-VQA: Model-Informed Data Selection for Video Quality Assessment
Abstract
Recent advances in learning-based video quality assessment (VQA) have achieved remarkable progress, yet the two fundamental components, model and data, are often studied in isolation.Model-centric approaches tend to design superior architectures over fixed and repeatedly used datasets, risking overfitting to benchmark-specific characteristics. In contrast, data-centric efforts emphasize constructing large-scale datasets through costly and time-consuming subjective experiments, typically overlooking the strengths and failure modes of existing VQA models. This separation limits progress, leading to brittle generalization and inefficient use of annotation resources.To bridge the gap, we introduce MDS-VQA, a model-informed data selection method that integrates model-centric and data-centric VQA. In its specific instantiation, a learned failure prediction module trained via a learning-to-rank formulation is combined with a content diversity measure based on deep semantic video features.Experiments across multiple VQA datasets demonstrate that MDS-VQA effectively spots diverse and challenging samples that expose model weaknesses.The selected videos are proven to be particularly informative for fine-tuning, offering a principled path toward constructing more challenging datasets and developing more generalizable and robust VQA models.