Content-Adaptive Hierarchical Hyperprior for Neural Video Coding
Abstract
While neural video codecs (NVCs) have recently demonstrated superior performance over traditional codecs through end-to-end learning, existing approaches primarily focus on architectural enhancements and coding module design, with limited exploration into optimizing hierarchical structures—specifically, quality and reference configurations. Current hierarchical structure optimization methods face two major limitations: (1) insufficient content-adaptive optimization, and (2) disjointed handling of quality and reference structures. To overcome these challenges, we propose a novel NVC framework that introduces content-adaptive hierarchical structure optimization through a hierarchical hyperprior derived from the current frame. Our NVC integrates two key components: (1) a hierarchical hyperprior extracted from the original frame to enable content-aware adaptation of the hierarchical structure; and (2) an adaptor within the hierarchical hyperprior codec combined with a dual-reference scheme, guided by the hyperprior, to jointly optimize quality and reference structures. By leveraging this content-adaptive hierarchical structure, our NVC achieves state-of-the-art rate-distortion performance, outperforming the previous leading NVC method DCVC-FM with BD-rate reductions of 15.51\% and 12.20\% relative to VTM-23.4 low-delay B (LDB) under intra-period settings of -1 and 32, respectively.