Skip to yearly menu bar Skip to main content


Combining Frame and GOP Embeddings for Neural Video Representation

Jens Eirik Saethre · Roberto Azevedo · Christopher Schroers

Arch 4A-E Poster #432
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Implicit neural representations (INRs) were recently proposed as a new video compression paradigm, with existing approaches performing on par with H.264 and HEVC. However, such methods only perform well in limited settings, e.g., specific model sizes, fixed aspect ratios, and low-motion videos. We address this issue by proposing T-NeRV, a hybrid video INR that combines frame-specific embeddings with GOP-specific features, providing a lever for content-specific fine-tuning. We employ entropy-constrained training to jointly optimize our model for rate and distortion and demonstrate that T-NeRV can thereby automatically adjust this lever during training, effectively fine-tuning itself to the target content. We evaluate T-NeRV on the UVG dataset, where it achieves state-of-the-art results on the video regression task, outperforming previous works by up to 3dB PSNR on challenging high-motion sequences. Further, our method improves on the compression performance of previous methods and is the first video INR to outperform HEVC on all UVG sequences.

Live content is unavailable. Log in and register to view live content