GauMVC: Generative Decoupled Gaussian Representation for Human-centric Multi-view Video Compression
Abstract
Human-centric multi-view video has a clear semantic structure: a static background and dynamic human motion. We propose a generative compression framework that explicitly decouples these components. The background is modeled once with 3D Gaussian Splatting, while the human is represented by a personalized Gaussian avatar reconstructed from a sparse set of key views that are transmitted only once and driven by compact per-frame pose parameters from the Skinned Multi-Person Linear (SMPL) model. The encoder sends only three elements: the background, the key views, and the SMPL parameters, enabling high-fidelity multi-viewpoint synthesis at dramatically reduced bitrates. This shifts compression from low-level redundancy removal to semantics-aware generative modeling. Experiments across multiple human-centric datasets demonstrate superior rate–distortion performance, particularly for long and densely captured sequences, and naturally enable semantic editing.