Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2)

Workshop

Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2)

Ashkan Khakzar · A. Koepke · Ameya Prabhu · Jindong Gu · Francesco Pinto · Arsha Nagrani · Boyi Li · Philip H.S. Torr · Trevor Darrell

Wed 11 Jun 10:45 a.m. PDT — 4 p.m. PDT

[ Abstract ] Workshop Website

[ Project Page ]

TLDR: This workshop focuses on analysis and evaluations to understand and identify emerging visual capabilities and pinpoint visual limits in foundation models.

Visual information processing is being transformed by foundation models. Trained on massive datasets using self-supervised and generative methods, these models exhibit the emergence of sophisticated visual abilities—such as depth perception, object recognition, and part discovery — without explicit programming or supervision. This shift marks a new paradigm where neural models derive visual understanding from the intrinsic structures and patterns present in the data rather than supervisory signals associated with a visual task. Moreover, questions remain about how to systematically analyze and evaluate these emergent capabilities. Recent studies have also highlighted the models' visual limitations, emphasizing the need for innovative evaluation methods to identify these shortcomings. By evaluating and understanding both the capabilities and limits of these models, we can better compare different learning algorithms and architectures in terms of how they represent the visual world.

Chat is not available.