HDR-VLM: HDR-Domain Adaptation of VLMs and Preference-Aligned Quality Assessment for HDR Video Color Grading
Abstract
Color grading is central to High Dynamic Range (HDR) video production, shaping the perceptual tone, contrast, and luminance of content across diverse displays. However, evaluating HDR color grading quality is particularly difficult due to its semantic, content-dependent nature and the lack of large-scale annotated data. While pre-trained Vision–Language Models (VLMs) offer strong semantic priors and generalization ability, their exposure is limited to Standard Dynamic Range (SDR) data, making them poorly equipped to handle HDR photometry and perceptual nuances. We propose HDR-VLM, the first method to adapt a VLM to the HDR domain for perceptual quality assessment. Specifically, HDR-VLM employs a two-stage design: it first bridges the domain gap using a unified HLG-based encoding and progressive adaptation; then it aligns model assessments with noisy, multi-scale human preferences via reinforcement learning with curriculum-inspired rewards. Experiments on a real-world, production-sourced HDR dataset show that HDR-VLM not only outperforms existing quality assessment methods but also produces interpretable attribution rationales. These rationales offer actionable guidance for content creators, enhancing the reliability and transparency of automated HDR quality evaluation.