PixFoundation: Workshop on Pixel-level Vision Foundation Models

Workshop

PixFoundation: Workshop on Pixel-level Vision Foundation Models

Mennatullah Siam · Stella X. Yu · Sangwoo Mo · Leonid Sigal · Raoul de Charette · Tanzila Rahman · He Zhao · Aoran Xiao

Thu 12 Jun 6:30 a.m. PDT — 3:30 p.m. PDT

[ Abstract ] Workshop Website

[ Project Page ]

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features seven invited talks, mixing emerging and established researchers, along with posters and selective spotlight presentations.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

-	Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval ( Paper ) >	Yuji Nozawa · Yu-Chieh Lin · Kazumoto Nakamura · Youyang Ng 🔗
-	ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements ( Paper ) > link Link	M. Arda Aydın · Efe Çırpar · Elvin Abdinli · Gozde Unal · Yusuf H. Sahin 🔗
-	Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation ( Paper ) > link Link	Gabriele Rosi · Fabio Cermelli 🔗
-	Hierarchical Semantic Segmentation with Autoregressive Language Modeling ( Paper ) >	Josh Myers-Dean · Brian Price · Yifei Fan · Danna Gurari 🔗