Skip to yearly menu bar Skip to main content


Open-Vocabulary 3D Semantic Segmentation with Foundation Models

Li Jiang · Shaoshuai Shi · Bernt Schiele

Arch 4A-E Poster #168
award Highlight
[ ]
Fri 21 Jun 10:30 a.m. PDT — noon PDT


In dynamic 3D environments, the ability to recognize a diverse range of objects without the constraints of predefined categories is indispensable for real-world applications. In response to this need, we introduce OV3D, an innovative framework designed for open-vocabulary 3D semantic segmentation. OV3D leverages the broad open-world knowledge embedded in vision and language foundation models to establish a fine-grained correspondence between 3D points and textual entity descriptions. These entity descriptions are enriched with contextual information, enabling a more open and comprehensive understanding. By seamlessly aligning 3D point features with entity text features, OV3D empowers open-vocabulary recognition in the 3D domain, achieving state-of-the-art open-vocabulary semantic segmentation performance across multiple datasets, including ScanNet, Matterport3D, and nuScenes. Code will be available.

Live content is unavailable. Log in and register to view live content