Skip to yearly menu bar Skip to main content


3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

Chenfeng Xu · Huan Ling · Sanja Fidler · Or Litany

Arch 4A-E Poster #93
[ ]
Thu 20 Jun 10:30 a.m. PDT — noon PDT


3DiffTection introduces a novel method for 3D object detection from single images, utilizing a 3D-aware diffusion model for feature extraction. Addressing the resource-intensive nature of annotating large-scale 3D image data, our approach leverages pretrained diffusion models, traditionally used for 2D tasks, and adapts them for 3D detection through geometric and semantic tuning.Geometrically, we enhance the model to perform view synthesis from single images, incorporating an epipolar warp operator. This process utilizes easily accessible posed image data, eliminating the need for manual annotation. Semantically, the model is further refined on target detection data. Both stages utilize ControlNet, ensuring the preservation of original feature capabilities. Through our methodology, we obtain 3D-aware features that excel in identifying cross-view point correspondences. In 3D detection, 3DiffTection substantially surpasses previous benchmarks, \textit{e.g.,} Cube-RCNN, by 9.43% in AP3D on the Omni3D-ARkitscene dataset. Furthermore, 3DiffTection demonstrates robust label efficiency and generalizes well to cross-domain data, nearly matching fully-supervised models in zero-shot scenarios.

Live content is unavailable. Log in and register to view live content