FRM: Linear-Time 3D Reconstruction via Test-Time Training
Haian Jin ⋅ Rundi Wu ⋅ Tianyuan Zhang ⋅ Ruiqi Gao ⋅ Jonathan T. Barron ⋅ Noah Snavely ⋅ Aleksander Holynski
Abstract
Feed-forward transformer models such as VGGT and $\pi^3$ are highly accurate, but their computational cost grows quadratically with the number of input images, making them slow to evaluate on large collections. More efficient approaches ameliorate this cost at the expense of reconstruction quality. We introduce Fast Reconstruction Model, a stateful feed-forward reconstruction model that uses a bidirectional architecture that scales linearly in the number of input views, while matching or surpassing the reconstruction quality of quadratic-time methods. FRM employs test-time training layers to compress images into a compact hidden scene state during a single forward pass, enabling our model to reconstruct 3D scenes at speeds up to 75 FPS on a single H100 GPU---over 20 times faster than SOTA methods such as VGGT. This hidden state also serves as an implicit scene representation which can be queried at real-time rates to produce colored point maps from novel views.
Successful Page Load