Poster Sun, Jun 7, 2026 • 2:30 PM – 4:30 PM PDT ExHall A 546

Image-to-Point Cloud Feature Back-Projection for Multimodal Training of 3D Semantic Segmentation

Jiawei Han ⋅ Matteo Poggi ⋅ HUAN LI ⋅ Changshuo Wang ⋅ Kaiqi Liu ⋅ Wei Li

Paper PDF

Abstract

The effective integration and utilization of multimodal data acquired from image cameras and LiDAR is of paramount importance for perception systems. This paper proposes Image-to-Point Cloud Feature Back-Projection (IPFP), a novel method for training multimodal fusion networks that back-projects aggregated image-feature centers (from non-projection-aligned image pixels) into the point-cloud feature set via the estimated depth map. Consequently, image features and point cloud features reside within the same three-dimensional space, enabling the natural enrichment of image information into the point cloud during the network forward pass. This process can be selectively enabled when desired -- for instance, at training time -- and turned off in the absence of multimodal data -- for example, at testing time if only LiDAR sensors are available. Experimental results demonstrate that IPFP can consistently improve state-of-the-art 3D semantic segmentation models, while retaining the ability to process LiDAR-only data at inference time.