Skip to yearly menu bar Skip to main content


Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani · Amit Raj · Kevis-kokitsi Maninis · Abhishek Kar · Yuanzhen Li · Michael Rubinstein · Deqing Sun · Leonidas Guibas · Justin Johnson · Varun Jampani

Arch 4A-E Poster #218
[ ]
Fri 21 Jun 10:30 a.m. PDT — noon PDT


Recent advances in large-scale pretraining have yielded visual foundation models with strong generalization abilities. Such models aim to learn representations that are useful for a wide range of downstream tasks such as classification, segmentation, and generation. These models have been proven successful in 2D tasks, where they can delineate and localize objects. But how much do they really understand in 3D? In this work, we analyze the 3D awareness of the representations learned by visual foundation models. We argue that 3D awareness at least implies (1) representing the 3D structure of the visible surface and (2) consistent representations across views. We conduct a series of experiments that analyze the representations learned using task-specific probes and zero-shot inference procedures on frozen features, revealing several limitations of current foundation models.

Live content is unavailable. Log in and register to view live content