Poster Sat, Jun 6, 2026 • 3:45 PM – 5:45 PM PDT ExHall A & F 404

3D-Object Perception Transformer (3PT)

Agastya Kalra ⋅ Tim Salzmann ⋅ Guy Stoppi ⋅ Dmitrii Marin ⋅ Rishav Agarwal ⋅ Vage Taamazyan ⋅ Martin Bokeloh ⋅ Stefan Hinterstoisser ⋅ Anton Boykov ⋅ Alberto Dall'Olio ⋅ Pravin Dangol ⋅ Kartik Venkataraman ⋅ Huaijin Chen

Highlight

Paper PDF

Abstract

Current approaches to zero-shot 3D-object perception typically rely on ensembles of frozen foundation models.This limits deep object understanding and cross-domain generalization, making performance inadequate for real-world deployment. The 3D-Object Perception Transformer (3PT) addresses this limitation by unifying detection, segmentation, and 6DoF pose estimation in a single framework, directly trained for 3D-object perception. Based on two large-scale trained Transformers that specialize in 2D and 3D object-centric scene understanding respectively, 3PT continuously refines its object representations without depth input, enhancing 3D understanding by incorporating multi-view information. 3PT surpasses task-specialized models for detection and pose estimation, often achieving double-digit percentage improvements on the diverse BOP-benchmarks. Achieving high accuracy and robustness, \algshort{} is well-suited for practical industrial robotics applications such as bin picking and precise insertion.