Workshop
What is Next in Multimodal Foundation Models?
Hilde Kuehne · Rogerio Feris · Leonid Karlinsky · Anna Kukleva · Ameya Prabhu · Wei Lin · Muhammad Jehanzeb Mirza · Sivan Doveh · Roei Herzig
Thu 12 Jun 6:30 a.m. PDT — 10:30 a.m. PDT
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Chat is not available.
Timezone: America/Los_Angeles
Schedule
|
-
|
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
(
Paper
)
>
|
Yang Jiao · Haibo Qiu · Zequn Jie · Shaoxiang Chen · Jingjing Chen · Lin Ma · Yu-Gang Jiang 🔗 |
|
-
|
Understanding Depth and Height Perception in Large Visual-Language Models
(
Paper
)
>
|
Shehreen Azad · Yash Jain · Rishit Garg · Vibhav Vineet · Yogesh Rawat 🔗 |
|
-
|
Repurposing SAM for User-Defined Semantics Aware Segmentation ( Paper ) > link | Rohit Kundu · Sudipta Paul · Arindam Dutta · Amit K. Roy-Chowdhury 🔗 |
|
-
|
PLVM: A tuning-free approach for Personalized Large Vision-Language Model
(
Paper
)
>
|
Chau Pham 🔗 |
|
-
|
How Good is my Video-LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
(
Paper
)
>
|
Muhammad Uzair Khattak 🔗 |
|
-
|
An Interactive Agent Foundation Model
(
Paper
)
>
|
Zane Durante 🔗 |