AnchorSplat: Feed-Forward 3D Gaussian Splatting With 3D Geometric Priors
Abstract
Scene-level 3D reconstruction has attracted increasing attention, and feed-forward 3D Gaussian Splatting (3DGS) has emerged as a promising paradigm for novel view synthesis. However, most existing methods adopt a pixel-aligned formulation that maps each 2D pixel to a 3D Gaussian, making the number of Gaussians tightly coupled with the input images. This leads to several limitations: (i) reconstruction quality is sensitive to the quantity and viewpoint coverage of input images, often causing Gaussians to accumulate more densely in regions with frequent viewpoints; (ii) alignment errors become more pronounced under sparse-view conditions; and (iii) the lack of explicit geometric consistency can degrade depth estimation and downstream 3D tasks. In this paper, we propose AnchorSplat, a novel multi-view feed-forward 3DGS framework for scene-level reconstruction that departs from pixel-aligned prediction and instead represents the scene directly in 3D space. AnchorSplat introduces anchor-aligned Gaussians guided by geometric priors (e.g., sparse point clouds, voxels, or RGB-D point clouds), enabling a more geometry-aware representation that is independent of image resolution and number of views. This design substantially reduces the number of required Gaussians, improving computational efficiency while enhancing reconstruction fidelity. The framework is trained in two stages: a Gaussian decoder first predicts anchor-aligned Gaussians, and a subsequent Gaussian refiner further improves their quality and view consistency. Experiments on the ScanNet benchmark demonstrate that AnchorSplat achieves state-of-the-art performance, producing more view-consistent and plausible 3D Gaussian reconstructions. Code, videos, and pretrained models will be released on the project page.