CVPR Poster Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Poster

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Shuyun Wang · Hu Zhang · Xin Shen · Dadong Wang · Xin Yu

ExHall D Poster #182

[ Abstract ] [ Paper PDF ]

Sun 15 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

Bitstream-corrupted video recovery aims to fill in realistic video content due to bitstream corruption during video storage or transmission.Most existing methods typically assume that the predefined masks of the corrupted regions are known in advance.However, manually annotating these input masks is laborious and time-consuming, limiting the applicability of existing methods in real-world scenarios. Therefore, we expect to relax this assumption by defining a new blind video recovery setting where the recovery of corrupted regions does not rely on predefined masks.There are two primary challenges in this scenario: (i) without predefined masks, how accurately can a model identify the regions requiring recovery?(ii) how to recover extensive and irregular contents, especially when large portions of frames are severely degraded or large-scale corrupted?To address these challenges, we introduce a Metadata-Guided Diffusion Model, dubbed M-GDM.To enable a diffusion model to focus on the corrupted regions, we leverage inherent video metadata as a corruption indicator and design a dual-stream metadata encoder.This encoder first processes the motion vectors and frame types of a video separately, and then merges them into a unified metadata representation.The metadata representation will interact with the corrupted latent feature via cross-attention in each diffusion step.Meanwhile, to preserve the intact regions, we propose a prior-driven mask predictor that generates pseudo masks for the corrupted regions by leveraging the metadata prior and diffusion prior.These pseudo masks enable the separation and recombination of intact and recovered regions through hard masking. However, imperfections in the pseudo masks and the hard masking process often result in boundary artifacts. Thus, we introduce a post-refinement module that refines the hard-masked outputs, enhancing the consistency between intact and recovered regions.Extensive experiment results validate the effectiveness of our method and demonstrate its superiority in video recovery tasks.

Live content is unavailable. Log in and register to view live content