Perceptual Neural Video Compression with Color Separation and Rank Chain
Abstract
Neural video compression (NVC) has achieved significant progress in recent years. The state-of-the-art (SOTA) NVC schemes, exemplified by the Deep Conditional Video Coding (DCVC) series, have focused on pursuing higher fidelity (e.g., PSNR), but lack sufficient exploitation of deep networks' advantages for better perceptual quality. We fill in this gap with two new techniques. First, we propose a color-separation-based framework, termed PNVC-C, which decouples luminance and chrominance processing to better align with human visual perception. This framework enables explicit and adaptive allocation of computation and bitrate budgets between luminance and chrominance components.Second, within this framework, we introduce the perceptual optimization scheme Rc-GAN, which leverages a bitrate-based rank chain loss to link variable-rate coding with perceptual quality ranking, enforcing consistent quality ordering and improving perceptual fidelity.Built upon these designs, we establish the PNVC-C framework with two variants: PNVC-C-Base, optimized for objective fidelity, and PNVC-CR, a perceptual variant that applies the Rc-GAN. Experimental results demonstrate that PNVC-C-Base achieves SOTA objective performance in YUV PSNR, while PNVC-CR attains SOTA perceptual quality on LPIPS, DISTS, KID, and FID metrics.Code and models will be publicly available.