Skip to yearly menu bar Skip to main content


Workshop

Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models

Perampalli Shravan Nayak · Mehar Bhatia · Qian Yang · Kanishk Jain · Rabiul Awal · David Adelani · Spandana Gella · Siva Reddy · Vered Shwartz · Yash Goyal · Sjoerd Steenkiste · Karolina Stanczak · Aishwarya Agrawal

104 E

Thu 12 Jun, 7 a.m. PDT

Keywords:  Multimodal learning  

The CVPR community has long focused on evaluating AI systems for their general scene-understanding capabilities. However, as these models are deployed globally, it is essential that they also understand cultural concepts and values, ensuring they cater to the diverse needs of users. This workshop expands computer vision frontiers by bringing together researchers from computer vision, natural language processing, AI ethics, and cultural anthropology to discuss how we can build geo-diverse and culturally aware vision-language models (or AI models in general). Specifically, the workshop will focus on evaluating the types of tasks, benchmarks, and metrics we should develop to advance AI systems' capabilities in this area and explore promising approaches to overcome the challenges. Second, the workshop will benchmark progress in geo-diverse and cultural understanding of vision-language models through the CulturalVQA and GlobalRG challenges, which will test critical abilities such as visual question answering and grounding in culturally diverse scenarios. The insights from this workshop extend beyond computer vision, with significant implications for fields like healthcare, education, and e-commerce, where culturally aligned AI can enhance user experiences. Additionally, the workshop aims to inspire further research in AI ethics, fairness, and responsible AI deployment.

Live content is unavailable. Log in and register to view live content