PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Dong Wu · Zike Yan · Hongbin Zha

Arch 4A-E Poster #189
Fri 21 Jun 10:30 a.m. PDT — noon PDT


We introduce the Panoptic 3D Reconstruction task, a unified and holistic scene understanding task for a monocular video. And we present PanoRecon - a novel framework to address this new task, which realizes an online geometry reconstruction alone with dense semantic and instance labeling. Specifically, PanoRecon incrementally performs panoptic 3D reconstruction for each video fragment consisting of multiple consecutive key frames, from a volumetric feature representation using feed-forward neural networks. We adopt a depth-guided back-projection strategy to sparse and purify the volumetric feature representation. We further introduce a voxel clustering module to get object instances in each local fragment, and then design a tracking and fusion algorithm for the integration of instances from different fragments to ensure temporal coherence. Such design enables our PanoRecon to yield a coherent and accurate panoptic 3D reconstruction. Experiments on ScanNetV2 demonstrate a very competitive geometry reconstruction result compared with state-of-the-art reconstruction methods, as well as promising 3D panoptic segmentation result with only RGB input, while being real-time. Code will be publicly available upon acceptance.

