Poster
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani · Dinura Dissanayake · Hasindri Watawana · Noor Ahsan · Nevasini Sasikumar · Omkar Thawakar · Henok Biadglign Ademtew · Yahya Hmaiti · Amandeep Kumar · Kartik Kuckreja · Mykola Maslych · Wafa Al Ghallabi · Mihail Minkov Mihaylov · Chao Qin · Abdelrahman Shaker · Mike Zhang · Mahardika Krisna Ihsani · Amiel Gian Esplana · Monil Gokani · Shachar Mirkin · Harsh Singh · Ashay Srivastava · Endre Hamerlik · Fathinah Asma Izzati · Fadillah Adamsyah Maani · Sebastian Cavada · Jenny Chim · Rohit Gupta · Sanjay Manjunath · Kamila Zhumakhanova · Feno Heriniaina Rabevohitra · Azril Hafizi Amirudin · Muhammad Ridzuan · Daniya Najiha Abdul Kareem · Ketan Pravin More · Kunyang Li · Pramesh Shakya · Muhammad Saad · Amirpouya Ghasemaghaei · Amirbek Djanibekov · Dilshod Azizov · Branislava Jankovic · Naman Bhatia · Alvaro Cabrera Berobide · Johan Obando-Ceron · Olympiah Otieno · Fabian Farestam · Muztoba Rabbani · Sanoojan Baliah · Santosh Sanjeev · Abduragim Shtanchaev · Maheen Fatima · Thao Nguyen · Amrin Kareem · Toluwani Aremu · Nathan Augusto Zacarias Xavier · Amit Bhatkal · Hawau Olamide Toyin · Aman Chadha · Hisham Cholakkal · Rao Anwer · Michael Felsberg · Jorma Laaksonen · Thamar Solorio · Monojit Choudhury · Ivan Laptev · Mubarak Shah · Salman Khan · Fahad Shahbaz Khan
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in multimodal research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including True/False, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model’s ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark will be publicly released.