$\alpha$Matte4K & $\mu$Matting: Dataset and Model for Ultra-Micro Precision Alpha Video Matting
Xinyi Chen ⋅ Hang Dong ⋅ Baowei Jiang ⋅ Shenkun Xu ⋅ Youqi Guan ⋅ Kanle Shi ⋅ Kun Gai ⋅ Haichuan Song
Abstract
High-resolution human video matting aims to predict accurate alpha mattes for semi-transparent regions while ensuring temporal consistency across frames.Despite notable progress, existing research remains limited by the insufficient quality of datasets, including (1) inaccurate alpha fractional values resulting from imperfect annotation, and (2) visual inconsistencies arising from arbitrary foreground-background compositions that lack natural coherence.In this paper, we introduce $\alpha$Matte4K, a large-scale 4K-resolution human video matting dataset, which achieves accurate annotations and physical consistency through physically based rendering (PBR).From model perspective, constrained by computational costs, current methods often up-sample alpha outputs to meet target resolutions that unavoidably diminishes precision.To overcome this critical limitation, we introduce $\mu$Matting, a innovative resolution-agnostic two-stage matting framework for video matting: (1) coarse matte localization using a portrait-aware masked autoencoder; (2) refinement of critical regions via sparse 3D convolution, augmented by a temporal modulator that injects global spatio-temporal cues for enhanced consistency and contextual awareness. Extensive experiments show that $\alpha$Matte4K boosts baseline performance, while $\mu$Matting surpasses state-of-the-art methods in accuracy and spatio-temporal consistency, driving applications in real-world scenarios.
Successful Page Load