GM-R^2: Generative Matching Learning for Unsupervised Geometric Representation and Registration
Abstract
This paper proposes GM-R^2, a novel Generative Matching Learning framework for unsupervised geometric descriptor learning and correspondence matching. By reformulating descriptor learning as geometry-conditioned cross-view image generation, GM-R^2 leverages the proxy supervisory signal from structurally aligned view synthesis to implicitly enforce feature consistency across correspondence, enabling robust 3D matching. To instantiate GM-R^2, we introduce Denoising-Agnostic Coupled ControlNet conditioned on depth maps as the required geometry-conditioned cross-view generator. It effectively extends the single-view generation of naive ControlNet to the cross-view via coupled depth-map input design and further remove the latent noise dependency to support geometry-only inference (expected by 3D matching). Moreover, we present Zoomable Equirectangular Projection for intrinsics-free point cloud-to-depth mapping that adaptively zooms into the angular region occupies by the narrow-FOV input for dense range-map acquisition. Extensive experiments on 3DMatch and ScanNet datasets verify the superior precision of our GM-R^2, even surpassing supervised methods.