Poster Fri, Jun 5, 2026 • 9:45 AM – 11:45 AM PDT ExHall A-F 42

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Zhongxiao Cong ⋅ Qitao Zhao ⋅ Minsik Jeon ⋅ Shubham Tulsiani

Paper PDF

Abstract

We propose Flow3r, a scalable framework for visual geometry learning that leverages flow prediction to guide learning using unlabeled monocular videos. Current 3D/4D reconstruction systems primarily rely on dense geometry and pose supervision, and cannot easily generalize to diverse dynamic real-world scenes. In this work, we propose a mechanism to augment training directly from unlabeled videos, leveraging dense 2D correspondences (or ‘flow’) between arbitrary image pairs as supervision. Our key insight is that a factored flow prediction module that computes from two images using ‘geometry latents’ from one image and the ‘pose latent’ from the othercan guide visual geometry learning. We first highlight the benefits and scalability of flow supervision in controlled settings and then leverage large-scale unlabeled data to improve off-the-shelf visual geometry models. We evaluate Flow3r across diverse 3D benchmarks and demonstrate competitive or state-of-the-art performance, even surpassing supervised models trained with more labeled data.