FreeScale: Scaling 3D Scenes via Certainty-Aware Free-View Generation
Abstract
The development of generalizable Novel View Synthesis (NVS) models is critically limited by the scarcity of large-scale training data with diverse and accurate camera trajectories. While real-world captures are photorealistic, they are typically sparse and discrete. Conversely, synthetic data scales but suffers from a domain gap and often lacks realistic semantics. We introduce FVGen, a novel framework that leverages the power of scene reconstruction to transform limited real-world image sequences into a scalable source of high-quality training data. Our key insight is that an imperfect reconstructed scene serves as a rich geometric proxy, but naively sampling from it amplifies artifacts. To this end, we propose a certainty-aware free-view sampling strategy that identifies novel viewpoints which are both semantically meaningful and minimally affected by reconstruction errors. We demonstrate FVGen's effectiveness by scaling up the training of feedforward NVS models, achieving a significant improvement of 2.6 dB on challenging out-of-distribution benchmarks. Furthermore, we show that the generated data can actively enhance per-scene 3D Gaussian Splatting optimization, leading to consistent improvements across multiple datasets. Our work provides a practical and powerful data generation engine to overcome a fundamental bottleneck in 3D vision.