Skip to yearly menu bar Skip to main content


PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Qi Zhao · M. Salman Asif · Zhan Ma

Arch 4A-E Poster #434
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


The primary focus of Neural Representation for Videos (NeRV) is the effective modeling of spatiotemporal consistency. However, present NeRV systems often face a significant issue of spatial inconsistency, which leads to a decrease in perceptual quality. To combat this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comprises a lightweight rescaling operator, Kronecker Fully-connected layer (KFc), and a Benign Selective Memory (BSM) mechanism. The KFc, inspired by the tensor decomposition of the vanilla Fully-connected layer, facilitates low-cost rescaling and global correlation modeling. BSM merges high-level features with granular ones adaptively. Furthermore, we provide the analysis based on Universal Approximation Theory towards NeRV system and validate the effectiveness of the proposed PNeRV. We have conducted comprehensive experiments to illustrate that PNeRV surpasses the performance of contemporary NeRV models, achieving the best results in video regression on UVG and DAVIS under various metrics (PSNR, SSIM, LPIPS, and FVD). When compared to the vanilla NeRV, PNeRV achieves +4.49db gain in PSNR and a 231% increase in FVD on UVG, along with +3.28db PSNR and 634% FVD increase on DAVIS.

Live content is unavailable. Log in and register to view live content