XPaintNet: An eXtreme Lightweight Framework for Stereoscopic Conversion without Inpainting Network
Abstract
With the rapid growth of stereoscopic 3D devices, real-time stereoscopic conversion has become increasingly essential. However, most existing approach rely on depth estimation, forward warping, and heavy inpainting network, resulting in high computational cost and artifacts near occlusion boundaries. Diffusion-based models have also been explored, but they suffer from iterative sampling and geometric inconsistency, making them unsuitable for real-time deployment. To address these issues, we propose Bi-Warp, a simple yet effective approach that synthesizes the right view without inpainting network by leveraging warping operations. Our approach estimates backward flow, approximates the corresponding forward flow, and generates two candidate right views via bidirectional warping. A learnable mask adaptively fuses the candidates, preserving left–right geometric consistency. Building on Bi-Warp, we introduce XPaintNet, a lightweight network that achieves comparable visual quality to state-of-the-art methods while maintaining real-time performance over 100 FPS at 2K resolution.