Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models

Workshop

Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models

Perampalli Shravan Nayak · Mehar Bhatia · Qian Yang · Kanishk Jain · Rabiul Awal · David Adelani · Spandana Gella · Siva Reddy · Vered Shwartz · Yash Goyal · Sjoerd Steenkiste · Karolina Stanczak · Aishwarya Agrawal

Thu 12 Jun 7 a.m. PDT — 4 p.m. PDT

[ Abstract ] Workshop Website

[ Project Page ]

The CVPR community has long focused on evaluating AI systems for their general scene-understanding capabilities. However, as these models are deployed globally, it is essential that they also understand cultural concepts and values, ensuring they cater to the diverse needs of users. This workshop expands computer vision frontiers by bringing together researchers from computer vision, natural language processing, AI ethics, and cultural anthropology to discuss how we can build geo-diverse and culturally aware vision-language models (or AI models in general). Specifically, the workshop will focus on evaluating the types of tasks, benchmarks, and metrics we should develop to advance AI systems' capabilities in this area and explore promising approaches to overcome the challenges. Second, the workshop will benchmark progress in geo-diverse and cultural understanding of vision-language models through the CulturalVQA and GlobalRG challenges, which will test critical abilities such as visual question answering and grounding in culturally diverse scenarios. The insights from this workshop extend beyond computer vision, with significant implications for fields like healthcare, education, and e-commerce, where culturally aligned AI can enhance user experiences. Additionally, the workshop aims to inspire further research in AI ethics, fairness, and responsible AI deployment.

Chat is not available.