High Resolution Neural Video Coding with Bi-directional Confidence-Guided Reference Information Modeling
Abstract
Exploiting bi-directional context prediction has long been recognized as a key direction for improving compression efficiency in neural video coding. However, existing neural B-frame codecs still exhibit limited performance gains, particularly in high-resolution videos with large motion, where optical flow estimation becomes unreliable and balanced prediction fusion introduces distortions. To address these challenges, we present the first High-Resolution bi-directional neural video coding method, termed as HR-NVC, which non-uniformly integrates confidence-guided predictive cues from both temporal directions to achieve more reliable and efficient compression. Specifically, we propose Spatio-Temporal Anchored Motion Estimation, which introduces virtual anchor frames and low-resolution priors to significantly improve estimation robustness under large displacements. We further design a Hierarchical Motion Representation that converges multi-scale motion with temporal references, enabling compact and adaptive modeling of motion reliability across resolutions. Finally, a Bi-Contextual Asymmetric Harmonization module performs confidence-guided fusion of bidirectional references, effectively suppressing unreliable contexts and restoring structural consistency near occlusion and scene transition regions. Notably, our model is the first end-to-end-optimized video codec evaluated on 4K-resolution videos, establishing a new benchmark for higher-resolution NVC and achieving state-of-the-art performance among neural B-frame codecs.