Workshop
PixFoundation: Workshop on Pixel-level Vision Foundation Models
Mennatullah Siam · Stella X. Yu · Sangwoo Mo · Leonid Sigal · Raoul de Charette · Tanzila Rahman · He Zhao · Aoran Xiao
101 E
Thu 12 Jun, 1 p.m. PDT
Keywords: Foundation models
In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features seven invited talks, mixing emerging and established researchers, along with posters and selective spotlight presentations.
Live content is unavailable. Log in and register to view live content