Poster Sun, Jun 7, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 434

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images

Xiangyu Sun ⋅ Haoyi Jiang ⋅ Liu Liu ⋅ Seungtae Nam ⋅ Gyeongjin Kang ⋅ Xinjie wang ⋅ Wei Sui ⋅ Zhizhong Su ⋅ Wenyu Liu ⋅ Xinggang Wang ⋅ Eunbyung Park

Highlight

Abstract

Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce a novel feed-forward framework that reconstructs 3D scenes from unposed multi-view images. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction—all within a single, feed-forward pass. Extensive experiments demonstrate this method establishes a new state-of-the-art across multiple benchmarks, including RE10K and ScanNet. Our work signifies a novel paradigm towards generalizable 3D scene reconstruction.