Workshop
Lidia Talavera-Martínez · Willams De Lima
[ 105 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://www.latinxinai.org/cvpr-2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
This workshop aims to promote and increase the participation of the LatinX community in Computer Vision. The workshop will provide a platform for LatinX researchers at all levels to share academic, industrial, cultural, and social challenges; highlight prominent LatinX researchers and allies; offer resources and opportunities for career growth through sponsored registrations, mentoring, and resume sharing; and raise the visibility of women researchers within the LatinX community. While the event focuses primarily on researchers who identify as LatinX, everyone is invited to attend.
Workshop
Jun Ma · Yuyin Zhou · Vishal M. Patel · Julia Schnabel · Bo Wang
[ 212 ]
Abstract
The rapid growth of foundation models in various domains has been transformative, bringing unprecedented capabilities and advances in automated understanding. Medical vision, a pivotal segment of computer vision, is poised to greatly benefit from these advancements. This workshop delves into the integration and application of foundation models specific to the realm of medical imaging. We will cover state-of-the-art techniques for diverse medical data, such as echocardiogram, fundus, pathology, and radiology, as well as the practical challenges of implementing these models in clinical settings. Through expert-led sessions, interactive discussions, and international competitions, we aim to offer attendees a comprehensive understanding of the potential impact foundation models could have on the future of medical diagnostics and patient care.
Workshop
Iuri Frosio · Ekta Prashnani · David Durst · Rulon Raymond · Marguerite deCourcelle · Nicu Sebe · Georgios Yannakakis · Joohwan Kim
[ 210 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/cv2-2025/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Rakesh Ranjan
[ 109 ]
Abstract
With the advent of passthrough devices such as the Quest 3, Apple Vision Pro, and more recently, Orion AR glasses, users can now engage in deeply immersive experiences that blend the virtual and real worlds, often referred to as Mixed Reality (MR). Unlike traditional Virtual Reality (VR), MR presents unique challenges in computer vision, such as capturing and reconstructing real-world environments with high fidelity and augmenting them with virtual elements in a realistic manner, in real-time.<br>This workshop aims to provide the research community with a deeper understanding of these MR-specific challenges and explore novel methods in areas like view synthesis, scene understanding, and efficient on-device AI, among others. Attendees will benefit from the insights of a diverse committee with expertise in 3D computer vision, graphics, human visual perception, and efficient machine learning.
Workshop
Vishwesh Nath · Jeya Maria Jose Valanarasu · Zhihong Chen · Xueyan Mei · Weidi Xie · Vishal M. Patel · Bennett Landman
[ 208 A ]
Abstract
Healthcare today stands at the intersection of technology and innovation, driven by diverse data sources—from clinical reports and electronic health records to medical imaging, vital signs, and numerous forms of unstructured data. While deep learning has significantly advanced medical imaging, the vast potential of integrating these abundant, multi-modal data streams remains largely untapped. This integration promises revolutionary improvements in patient outcomes, yet navigating this landscape poses unique and complex challenges due to the fragmented and isolated nature of healthcare data. This workshop addresses the critical questions facing researchers and practitioners: How can we effectively align and integrate multi-modal medical data? How do we tackle safety, privacy, interpretability, and the scarcity of clinically driven benchmarks?
Workshop
Yung-Hsiang Lu · Shuai Zhang · George K. Thiruvathukal
[ 207 A-D ]
Abstract
Efficient computer vision on mobile, auto and edge devices significantly impacts daily life, technology, and industry. This workshop will explore the latest advancements in multimodal LLM, autonomous driving, Gaussian splatting avatars, and robotics. Additionally, discussions will delve into new optimization methods and applications, highlighting the 2025 IEEE Low Power Computer Vision Challenge (lpcv.ai), where winners of the three tracks will present their innovative solutions.
Workshop
Radu Timofte · Zongwei Wu · Florin-Alexandru Vasluianu · Yawei Li
[ Davidson C3 ]
Abstract
Image and video restoration, enhancement, and manipulation are key computer vision tasks with increasing importance across various fields.<br>The 10th edition of the NTIRE workshop seeks to provide a comprehensive overview of recent trends and advancements in these areas, facilitating interaction and potential collaboration between academic and industrial participants.<br>The NTIRE associated challenges gauge the state-of-the-art in topics such as super-resolution, efficiency, quality assessment, enhancement, normalization, removal of shadows, reflections and raindrops, HDR, light fields, raw restoration, reconstruction, event-based deblurring, cross-domain detection, depth estimation, night photography, and face restoration.<br>Building on the success of the previous editions, this event will feature presentations covering a wide selection of topics from 69 papers accepted for publication, organizers and winners of the 23 associated challenges, and invited talks provided by distinguished researchers.
Workshop
Giuseppe Serra · Ali Abdari · Alex Falcon · Beatrice Portelli · Vanessa Sklyarova · Barbara Roessle · Daniel Jung · Shunlin Lu · Ji Hou · Bichen Wu · Djamila Aouada · Gyeongsik Moon
[ 107 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/cv4metaverse-2025/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
In the ever-growing areas of Augmented Reality (AR), Virtual Reality (VR), and the expansive Metaverse, computer vision brings together the digital and physical worlds seamlessly. Its ability to understand and interpret visual information pushes these immersive technologies to new levels, enhancing user experiences, driving creative innovations, and exploring new frontiers. On the other side, Natural Language Processing (NLP) is pivotal for deciphering human language and facilitating applications like translation and summarization. Large Language Models (LLMs) are now capable of human-level conversational skills, drastically enhancing human-machine interactions. As exemplified by CLIP and other multimodal foundational models, textual information plays a significant role in understanding visual data. Furthermore, as a consequence, these large models may contribute significantly to improving AR, VR, and the Metaverse, enabling hands-free navigation, voice-based commands, and immersive communication between avatars.
Workshop
Wei Ji · Hong Liu · Zhun Zhong · Zhe Zeng · Elisa Ricci · Andrew Wilson · Shin’ichi Satoh · Nicu Sebe
[ 101 E ]
Abstract
Today’s interconnected world presents unique challenges for intelligent systems in processing and integrating diverse data modalities, including text, audio, and visual data. However, traditional closed-world paradigms can fall short when faced with unseen classes and novel scenarios, which frequently emerge in complicated real-world environments. We propose the consideration of open-world learning as a way to build intelligent systems that are highly adaptable while also being robust and trustworthy, capable of tackling highly dynamic and creative tasks. Here, the integration of privacy-preserving techniques is crucial as data sources expand, particularly in high-stakes applications such as autonomous navigation systems for public safety. These systems must discern and adapt to evolving traffic patterns, weather conditions, and user behaviors in real time, underscoring the necessity of continuous learning and resilience against adversities. By exploring these critical challenges, this workshop aims to foster discussions that advance the development of trustworthy, multi-modal systems capable of thriving in open-world contexts.
Workshop
Yuhao Chen · Petia Radeva · Jiangpeng He · Bhalaji Nagarajan · Fengqing Zhu
[ 108 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/cvpr-metafood-2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Today, computer vision algorithms show near-perfect performance, better than human when there are clear, well curated and enough amount of data. However, there remains a substantial gap when it comes to applying state-of-the-art computer vision algorithms to food data, particularly when dealing with food in its natural, uncontrolled environment, often referred to as “data in the wild.” This gap stems from the inherent challenges in noisy, watermarked, and low-quality food data readily available on the internet. The MetaFood Workshop (MTF) invites the CVPR community to engage with the food domain-related challenges. These challenges provide not only a demanding, real testing environment for the development of robust computer vision algorithms, but also an exciting opportunity to develop new algorithms in the fields of food data analysis and food digitization.
Workshop
Azadeh Dinparastdjadid · Žan Gojčič · Maximilian Igl · Maximilian Naumann · Thomas Gilles · Ekaterina Tolstaya · Sanja Fidler · Shimon Whiteson
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://agents4ad.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Congyue Deng · Evangelos Chatzipantazis · Jiahui Lei · YINSHUANG XU · Stefanos Pertigkiozoglou · Minghan Zhu · Huazhe Xu · Thomas W. Mitchel · Leonidas Guibas · Kostas Daniilidis
[ 101 C ]
Abstract
Exploiting symmetry in structured data is a powerful way to improve the generalization ability, data efficiency, and robustness of AI systems, which leads to the research direction of equivariant deep learning. Showing its effectiveness, it has been widely adopted in a large variety of subareas of computer vision, from 2D image analysis to 3D perception, as well as further applications such as medical imaging and robotics. The workshop will foster discussion and knowledge exchange among researchers actively working on equivariance, providing a platform to share methodologies and explore the latest advancements in this rapidly evolving field.
Workshop
Nalini Ratha · Srikrishna Karanam · Kuan-Chuan Peng · Mayank Vatsa · Richa Singh · Ziyan Wu · Michele Merler · Kush Varshney
[ Davidson C1-2 ]
Abstract
Workshop
Chen Chen · Guangyu Sun · Nathalie Baracaldo · Yang Liu · Peter Richtárik · Mi Zhang · Lingjuan Lyu · Nicholas Lane · Ang Li · Bo Li · Mahdi Morafah
[ 103 B ]
Abstract
This workshop aims at bringing together researchers and practitioners with common interest in federated learning for computer vision. This workshop is an attempt at studying the different synergistic relations in this interdisciplinary area. This day-long event will facilitate interaction among students, scholars, and industry professionals from around the world to discuss the future research challenges and opportunities.
Workshop
Michael Ying Yang · Pietro Morerio · Paolo Rota · Bodo Rosenhahn · Vittorio Murino
[ 106 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://mula-workshop.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The aim of this workshop is to generate momentum around multimodal learning and applications, and to encourage interdisciplinary interaction and collaboration between computer vision, multimedia, remote sensing, and robotics communities, that will serve as a forum for research groups from academia and industry. We expect contributions involving, but not limited to, image, video, audio, depth, IR, IMU, laser, text, drawings, synthetic, etc. Position papers with feasibility studies and cross-modality issues with highly applicative flair are also encouraged. Multimodal data analysis is a very important bridge among vision, multimedia, remote sensing, and robotics, therefore we expect a positive response from these communities.
Workshop
Yixin Chen · Baoxiong Jia · Yao Feng · Songyou Peng · Chuhang Zou · Sai Kumar Dwivedi · Yixin Zhu · Siyuan Huang · Derek Hoiem · Marc Pollefeys · Song-Chun Zhu
[ 106 C ]
Abstract
The developments in computer vision, graphics, and robotics have jointly spurred calls for next-generation AI systems that physically interact with their surroundings. Current research advances encompass 3D representations, large-scale foundation models, and end-to-end VLA approaches, but fundamental questions remain on how best to sustain environment comprehension, align efforts from diverse fields, and integrate scene understanding techniques to enhance physical interaction. The workshop seeks to unite current efforts, educate an interdisciplinary workforce with expertise across fields, and promote future developments in embodied and general AI.
Workshop
Wentao Zhu · Fangchen Liu · Bike Zhang · He Wang · Li Yi · Koushil Sreenath · Yizhou Wang · Pieter Abbeel · Leonidas Guibas
[ 101 D ]
Abstract
Workshop
Andrew Owens · Jiajun Wu · Kristen Grauman · Antonio Torralba · William Freeman · Andrew Zisserman · Hang Zhao · Ruohan Gao · Triantafyllos Afouras · Arsha Nagrani · Jean-Charles Bazin
[ 211 ]
Abstract
Since pretty much every video has an audio track, the prospect of learning from paired audio-visual data — either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms — is intuitively appealing, and this workshop will cover recent advances in this direction. But it will also touch on higher-level questions, such as what information sound conveys that vision doesn’t, the merits of sound versus other “supplemental” modalities such as text and depth, and the relationship between visual motion and sound. We’ll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Workshop
Shu Kong · Neehar Peri · Yu-Xiong Wang · Andrew Owens · Abhinav Shrivastava
[ 104 C ]
Abstract
Visual perception is crucial for a wide range of applications. Traditionally, visual perception models were developed under a closed-world paradigm, where data distributions and categorical labels were assumed to be fixed and known in advance. However, these closed-world models often prove brittle when deployed in the real open world, which is dynamic, vast, and unpredictable. Modern approaches to visual perception have shifted towards open-world models, such as pretraining foundation models on large datasets sourced from the open world (e.g., data collected from the Internet). These foundation models are then adapted to solve specific downstream tasks. While contemporary model training follows the principle of "open-world learning," our workshop seeks to address existing limitations, potential risks, new opportunities, and challenges.
Workshop
Utkarsh Mall · Ye Zhu · Jacob Berv · Siavash Golkar · Katherine Bouman · Subhransu Maji · David Fouhey
[ 205 C ]
Abstract
This workshop aims to: bring together researchers working on computer vision and diverse scientific domains to discuss the latest advancements, challenges, and opportunities at their intersections. The goal is to foster interdisciplinary collaboration, build community within the computer vision community, and highlight progress and researchers at the interface of computer vision and the sciences. AI advancements have become a transformative force, extending beyond their original domain to drive breakthroughs in scientific discovery—an impact highlighted by the 2024 Nobel Prizes in Physics and Chemistry. Computer vision, as one of the core areas in AI research, offers powerful tools for analyzing data, with applications spanning a wide range of scientific fields, from accelerating discoveries in astrophysics and biology to enhancing environmental monitoring and materials science.
Workshop
László A. Jeni · Morteza Ziyadi · Hao Yang · Xu Zhang · Yang Zou · Zhaowei Cai · Maria Zontak · Davide Modolo · Ashwin Swaminathan · Liuyue Xie · Mosam Dabhi · Xiang Yue · Ce Zheng · Rohan Choudhury · Ananya Bal
[ 202 C ]
Abstract
Workshop
Nico Lang · Elijah Cole · Suzanne Stathatos · Lukas Picek · Klara Janouskova · Christine Kaeser-Chen · Justin Kay · Joakim Bruslund Haurum · Xiangteng He · Mehmet Aygun · Serge Belongie · Oisin Mac Aodha · Subhransu Maji · Sara Beery · Grant Horn
[ 104 E ]
Abstract
FGVC12 will explore topics of broad interest to the computer vision community, specifically addressing self-supervision, limited data, and human-in-the-loop learning through the challenging lens of fine-grained learning. This focus extends beyond traditional computer vision, offering methodologies applicable to real-world scenarios in domains like ecology, biology, medicine, and art history, thus fostering participation from researchers outside the CVPR community. The workshop will feature innovative challenges, building upon successful past competitions like iNaturalist, which have previously introduced new datasets and fostered novel solutions. FGVC12 will feature not only leading researchers from the field of computer vision, but also experts from domains such as biomedical data science and ecology to promote discussion of open problems in these disciplines.
<br>
<br>FGVC12 acknowledges the support from our Gold Sponsor Google DeepMind.
Workshop
Kristina Monakhova · Mark Sheinin · Fei Xia · Vishwanath Saragadam
[ 205 A ]
Abstract
This workshop is designed to unite the computational camera and display communities in that it considers to what degree concepts from computational cameras can inform the design of emerging computational displays and vice versa, both focused on applications in computer vision. The Computational Cameras and Displays (CCD) workshop series serves as an annual gathering place for researchers and practitioners who design, build, and use computational cameras, displays, and imaging systems for a wide variety of uses. The workshop solicits posters and demo submissions on all topics relating to computational imaging systems.
Workshop
Adam Kortylewski · Fangneng Zhan · Tian Han · Alan L. Yuille · Christian Theobalt
[ Grand A2 ]
Abstract
This workshop aims to foster collaboration between researchers in generative AI and computer vision to explore how visual recognition can benefit from recent advances in generative image modeling. The workshop will feature expert discussions on research results and future directions, specifically focusing on topics such as generative models as data source for training computer vision models, benchmarking with generative models, analysis-by-synthesis approaches, self-supervised learning, adversarial robustness, out-of-distribution generalization, and ethical considerations within generative modeling.
Workshop
Tianjian Jiang · Manuel Kaufmann · Jie Song · Soyong Shin · Jiye Lee · Ye Yuan · Otmar Hilliges
[ 105 B ]
Abstract
The Global 3D Human Poses (G3P) workshop focuses on innovative techniques that incorporate trajectory data into pose estimation. By fostering collaboration among researchers and practitioners, the workshop will delve into new methodologies, address emerging challenges, and discuss the transformative potential of global pose estimation. Ultimately, the insights and innovations presented here are poised to push the boundaries of computer vision and pave the way for more robust, real-world applications in interactive systems and beyond.
Workshop
Vincent Casser · Alexander Liniger · Jose M. Alvarez · Maying Shen · Jannik Zürn · Chiyu “Max” Jiang · Nadine Chang · Dragomir Anguelov · John Leonard · Luc Van Gool
[ Grand C1 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvpr2025.wad.vision/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The CVPR 2025 Workshop on Autonomous Driving (WAD) brings together leading researchers and engineers from academia and industry to discuss the latest advances in autonomous driving. Now in its 8th year, the workshop has been continuously evolving with this rapidly changing field and now covers all areas of autonomy, including perception, behavior prediction and motion planning. In this full-day workshop, our keynote speakers will provide insights into the ongoing commercialization of autonomous vehicles, as well as progress in related fundamental research areas. Furthermore, we will host a series of technical benchmark challenges to help quantify recent advances in the field, and invite authors of accepted workshop papers to present their work.
Workshop
Vivek Sharma · Shyamal Buch · Anurag Arnab · Ali Diba · Mohsen Fayyaz · Luc Van Gool · Joao Carreira · Manohar Paluri · Ehsan Adeli · Jürgen Gall · David A. Ross
[ Davidson A1 ]
Abstract
In recent years, the ability of computer systems to classify and analyze online videos has greatly improved. Significant advancements have been made in specific video recognition tasks, such as action and scene recognition. However, the comprehensive understanding of videos, known as holistic video understanding (HVU), has not received the attention it deserves. Current video understanding systems are specialized, focusing on narrow tasks.<br><br>For real-world applications like video search engines, media monitoring systems, and defining a humanoid robot's environment, integrating state-of-the-art methods is essential. To address this need, we are hosting a workshop focused on HVU. This workshop will cover recognizing scenes, objects, actions, attributes, and events in real-world videos.<br><br>We are introducing our HVU dataset, organized hierarchically with a semantic taxonomy for holistic video understanding. While many existing datasets focus on human action or sport recognition, our new dataset aims to broaden the scope and draw attention to the potential for more comprehensive video understanding solutions.<br><br>Our workshop will gather ideas related to multi-label and multi-task recognition in real-world videos, using our dataset to test and showcase research efforts.
Workshop
Shizhe Chen · Ricardo Garcia Pinel · Jiafei Duan · Dieter Fox · Cordelia Schmid · Ivan Laptev · Sami Haddadin
[ 110 B ]
Abstract
Robotic manipulation is one of the most fascinating and challenging problems in robotics, with broad applications in manufacturing, customer service, healthcare, household tasks and more. While learning-based visual policies have achieved impressive results such as manipulating Rubik’s cubes, they are typically trained and tested in the same environments on specific tasks, lacking generalization capabilities to new scenes, objects and tasks. Recently, foundation models such as large language models (LLMs) and vision-language models (VLMs) have demonstrated strong abilities to encode vast amounts of world knowledge and generalize to new domains, offering a promising path forward for enhancing robots’ generalization capabilities. In this workshop, we aim to unite researchers from different communities to push the boundaries of generalizable robotic manipulation, including foundation models, perception, planning, embodied AI, simulators, sim2real, among others.
Workshop
Jianwei Yang · Chunyuan Li · Jiasen Lu · Reuben Tan · Qianhui Wu · Baolin Peng · Mu Cai · Xuehai He · Hao Zhang · Tianhe Ren · Feng Li · Shilong Liu · Xueyan Zou · Zhengyuan Yang · Xin Wang · Yong Jae Lee · Lei Zhang · Jianfeng Gao
[ 101 B ]
Abstract
As artificial intelligence continues to evolve, the intersection of vision and language models is becoming increasingly crucial for real-world applications. The 4th Workshop on Computer Vision in the Wild (CVinW) at CVPR 2025 aims to foster discussions and innovations that push the boundaries of computer vision systems in unconstrained environments. Building on the success of our previous workshops: CVPR 2024 CVinW Workshop, CVPR 2023 CVinW Workshop and ECCV 2022 CVinW Workshop, this edition will focus on the next generation of large multimodal models (LMMs) and vision-language-action (VLA) systems, with an emphasis on temporal reasoning, video understanding, and physical interaction.
Workshop
Sam Tsai · Ji Hou · Jialiang Wang · Yaqiao Luo · Simran Motwani · Xiaoliang Dai · Peizhao Zhang · Kunpeng Li · Peter Vajda · Tao Xu · Chih-Yao Ma
[ 110 A ]
Abstract
We are proud to announce the launch of the 2nd GenAI Media Generation Challenge (MAGIC), featuring a media generation track and auto-evaluation track: Media Generation Festival: For the first time, we are organizing a media generation festival with no restrictions on prompts. We would define a few different topics for which submitted media would compete in, and participants can submit their best generated videos or images for those specific topics. For each topic, we run a crowd sourced voting mechanism to determine the winners for each topic. Auto Evaluation Challenge: We are introducing an auto evaluation challenge for both text-to-image and text-to-video tasks. Participants can develop and submit their auto evaluation score for a preselect set of images and videos that we will provide and enter into the media generation festival track. Auto evaluation submissions would be to predict the outcomes from the crowd sourced voting mechanism in the media generation festival The auto evaluation method that achieves the best correlation with the final results would be the winners for this challenge.
Workshop
Mubarak Shah · Larry S. Davis · Rene Vidal · Son Dinh Tran · Angela Yao · Salman Khan · Rita Cucchiara · Cees G. M. Snoek · Christoph Feichtenhofer · Chang Xu · Jayakrishnan Unnikrishnan · Afshin Dehghan · Mamshad Nayeem Rizve · Rohit Gupta · Swetha Sirnam · Ashmal Vayani · Omkar Thawakar · Muhammad Uzair Khattak · Dmitry Demidov
[ Grand A1 ]
Abstract
This workshop will explore the evolution, applications, and challenges of Video Large Language Models (VidLLMs), the latest advancement in multimodal LLMs. It will feature keynote talks from leading researchers, a panel discussion comparing VidLLMs with expert models, and a poster session. The workshop also includes three challenge tracks designed to evaluate VidLLMs' capabilities in compositional video retrieval, complex video reasoning and robustness, and multilingual video reasoning. These tracks aim to address key research areas such as training VidLLMs, their application in specialized computer vision tasks, and the challenges in evaluating their performance. Potential topics for invited papers include VidLLM methods/algorithms, data creation, evaluation and analysis, best practices, applications, and limitations, risks and safety. <br>
Workshop
Jieyu Zhang · Cheng-Yu Hsieh · Zixian Ma · Rundong Luo · Shobhita Sundaram · Wei-Chiu Ma · Ranjay Krishna
[ Grand C2 ]
Abstract
Workshop
Hongyang Li · Kashyap Chitta · Andrei Bursuc · Christos Sakaridis · Jonah Philion · Florent Bartoccioni · Ana-Maria Marcu · Huijie Wang
[ Grand B1 ]
Abstract
Autonomous systems, such as robots and self-driving cars, have rapidly evolved over the past decades. Despite this, several problems remain. Attempts have been made to develop more capable autonomous systems, such as integrating foundation models and utilizing large-scale data. However, the challenging problems have yet to be solved.<br><br>The motivation behind this workshop is to explore potential solutions, and discuss the challenges and opportunities associated with these approaches. We believe that this workshop serves as a brand-new perspective on the present and future of autonomous systems, and is necessary for both the robotics and computer vision communities.
Workshop
Guoyu Lu · Friedrich Fraundorfer · Yan Yan · Nicu Sebe · Chandra Kambhamettu
[ 102 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/vocvalc2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Visual odometry has attracted substantial interest in computer vision, robotics and mechanical engineering communities, to name a few. This workshop aims to foster scalable algorithms and systems for accurate and real-time visual odometry, addressing the growing demands of location-aware applications. It will explore methods and applications leveraging location cues to enhance scene understanding, city navigation, and other context-rich problems, while emphasizing visual odometry and localization in mobile and robotics domains.
Workshop
Manling Li · Ruohan Zhang · Jiayuan Mao · Wenlong Huang · Qineng Wang · Weiyu Liu · Xiaohan Zhang · Yonatan Bisk · Shenlong Wang · Yunzhu Li · Li Fei-Fei · Jiajun Wu
[ 214 ]
Abstract
An embodied agent is a generalist agent that can take natural language instructions from humans and perform a wide range of tasks in diverse environments. Recent years have witnessed the emergence of Large Language Models as powerful tools for building Large Agent Models, which have shown remarkable success in supporting embodied agents for different abilities such as goal interpretation, subgoal decomposition, action sequencing, and transition modeling (causal transitions from preconditions to post-effects). However, moving from Foundation Models to Embodied Agents poses significant challenges in understanding lower-level visual details, and long-horizon reasoning for reliable embodied decision-making. We will cover the advances of the foundation models into Large Language Models Vision-Language Models, and Vision-Language-Action Models. In this workshop, we will comprehensively review existing paradigms for foundations for embodied agents, and focus on their different formulations based on the fundamental mathematical framework of robot learning, Markov Decision Process (MDP), and present a structured view to investigate the robot’s decision-making process. More information at https://foundation-models-meet-embodied-agents.github.io/cvpr2025.
Workshop
Andrea Pilzer · Martin Trapp · Arno Solin · Gianni Franchi · Andrei Bursuc · Marcus Klasson · Angela Yao · TUAN-HUNG VU · Fatma Güney
[ 102 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://uncertainty-cv.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The UNcertainty quantification for Computer Vision (UNCV) Workshop aims to raise awareness and generate discussion regarding how predictive uncertainty can, and should, be effectively incorporated into models within the vision community. At the time of Generative AI (GenAI) we find this more crucial than ever. The workshop will bring together experts from machine learning and computer vision to create a new generation of well-calibrated and effective methods that know when they do not know.
Workshop
Polina Kirichenko · Vikram V. Ramaswamy · Kyle Buettner · Sina Malakouti · Tarun Kalluri · Manmohan Chandraker · Adriana Kovashka · Olga Russakovsky
[ 213 ]
Abstract
AI systems should serve all people with diverse values and perspectives around the world. However, as datasets scale, it's widely documented that they exhibit social biases of various forms, which translate to AI systems that cause real-world harm to under-represented demographic groups. A focused investigation of demographic biases in modern foundation models, their real-world impact and mitigation is thus critical to ensure equitable access to future models and their applications. This workshop will highlight diverse voices from around the globe and foster discussion on building inclusive AI.
Workshop
Ronny Haensch · Devis Tuia · Jan D. Wegner · Loic Landrieu · Charlotte Pelletier · Hannah Kerner · Nathan Jacobs
[ 208 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://www.grss-ieee.org/events/earthvision-2025/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Earth Observation (EO) and remote sensing are ever-growing fields of investigation where computer vision, machine learning, and signal/image processing meet. The general objective of the domain is to provide large-scale and consistent information about processes occurring at the surface of the Earth by exploiting data collected by airborne and spaceborne sensors. Earth Observation covers a broad range of tasks, from detection to registration, data mining, and multi-sensor, multi-resolution, multi-temporal, and multi-modality fusion and regression, to name just a few. It is motivated by numerous applications such as location-based services, online mapping services, large-scale surveillance, 3D urban modeling, navigation systems, natural hazard forecast and response, climate change monitoring, virtual habitat modeling, food security, etc. The sheer amount of data calls for highly automated scene interpretation workflows.
Workshop
Jack Langerman · Ruisheng Wang · Dmytro Mishkin · Ilke Demir · Renzhong Guo · Tolga Birdal · Sean Ma · Clement Mallet · Yang Wang · Shangfeng Huang
[ 104 D ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://usm3d.github.io) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Classical 3D reconstruction has traditionally focused on low-level representations, and this workshop addresses the need for higher-level, structured and parametric representations like CAD models from images and point clouds, with implications for construction, manufacturing, urban planning, and related fields. The workshop aims to foster interdisciplinary collaboration between 3D vision researchers, photogrammetry, graphics, machine learning, and other domains where structured 3D representations are critical. To advance research in this area, the workshop introduces two large-scale datasets: S23DR, a collection of 3D models with corresponding multiview images, and Building3D, a city-scale dataset for building wireframe model generation from aerial LiDAR. By providing these resources and promoting collaboration, the workshop seeks to catalyze multi-view structured 3D reconstruction trends, bridge industry-academia gaps, and enable applications in urban planning, disaster management, and other critical areas.
Workshop
Jiafei Duan · Muhammad Zubair Irshad · Ishika Singh · Vitor Guizilini · Rares Andrei Ambrus · Zsolt Kira
[ 101 A ]
Abstract
Workshop
Andrey Ignatov · Radu Timofte
[ 103 C ]
Abstract
Over the past years, mobile AI-based applications are becoming more and more ubiquitous. Various deep learning models can now be found on any mobile device, starting from smartphones running portrait segmentation, image enhancement, face recognition and natural language processing models, to smart-TV boards coming with sophisticated image super-resolution algorithms. The performance of mobile NPUs and DSPs is also increasing dramatically, making it possible to run complex deep learning models and to achieve fast runtime in the majority of tasks. While many research works targeted at efficient deep learning models have been proposed recently, the evaluation of the obtained solutions is usually happening on desktop CPUs and GPUs, making it nearly impossible to estimate the actual inference time and memory consumption on real mobile hardware. To address this problem, we introduce the first Mobile AI Workshop, where all deep learning solutions are developed for and evaluated on mobile devices.
Workshop
Shangzhe Wu · Qianqian Wang · Gengshan Yang · Jiahui Lei · Ruoshi Liu · Yufei Ye · Congyue Deng · Tarasha Khurana · Aleksander Holynski · Carl Doersch
[ 104 B ]
Abstract
In recent years, we have seen remarkable progress in 3D computer vision, with increasingly robust and efficient models for reconstructing and generating 3D objects and scenes. 4D computer vision, as a natural extension of these efforts, is rapidly gaining traction. This workshop aims to establish a dedicated venue for discussions on this topic, bringing together researchers across various domains to exchange perspectives, identify challenges, and collectively accelerate progress in this space.
Workshop
Nadine Chang · Maying Shen · Jose M. Alvarez · Sifei Liu · Rafid Mahmood · Despoina Paschalidou
[ 201 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/nexd25/home) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Data is more crucial than ever, enabling the first generation of deep learning models to the new generation of foundation models. These foundation models are rapidly incorporating into several safety critical applications of human life. Thus, the large volume of data they rely on must be high-quality for safe model development. Due to the sheer volume of raw data, it is necessary to obtain a scalable ability to rank and select data by its inherent quality and value for both generic and specific tasks. Recently, foundation models themselves are used to discover even more data to feed into more foundation model training. This cyclic relationship between data and foundation models introduces another layer of complexity and biases to consider. Overall, this enormous challenge to discover the next generation of data requires several considerations: definition of quality data, bias-free data, scalability, generating data, ethical data gathering, continuous data gathering, and hallucination free foundation models for data mining. …
Workshop
Ashkan Khakzar · A. Koepke · Ameya Prabhu · Jindong Gu · Francesco Pinto · Arsha Nagrani · Boyi Li · Philip H.S. Torr · Trevor Darrell
[ 210 ]
Abstract
TLDR: This workshop focuses on analysis and evaluations to understand and identify emerging visual capabilities and pinpoint visual limits in foundation models.<br><br>Visual information processing is being transformed by foundation models. Trained on massive datasets using self-supervised and generative methods, these models exhibit the emergence of sophisticated visual abilities—such as depth perception, object recognition, and part discovery — without explicit programming or supervision. This shift marks a new paradigm where neural models derive visual understanding from the intrinsic structures and patterns present in the data rather than supervisory signals associated with a visual task. Moreover, questions remain about how to systematically analyze and evaluate these emergent capabilities. Recent studies have also highlighted the models' visual limitations, emphasizing the need for innovative evaluation methods to identify these shortcomings. By evaluating and understanding both the capabilities and limits of these models, we can better compare different learning algorithms and architectures in terms of how they represent the visual world.
Workshop
Anand Bhattad · Aditya Prakash · Unnat Jain · Angjoo Kanazawa · Georgia Gkioxari · Svetlana Lazebnik
[ 209 A-C ]
Abstract
In today’s AI landscape, visibility is harder than ever. The pace is breakneck, arXiv is overflowing, and the pressure to perform is real. So how do early-career researchers cut through the noise?<br><br>How do you define a research identity without chasing trends?<br>How do you publish with purpose, not just pace?<br>How do you explore emerging areas without getting lost in the noise?<br>How do you balance mentorship with momentum?<br><br>In its third year, this CVPR community-building workshop we bring voices across CV, NLP, ML, and Robot Learning -- Andrea, Carl, Dima, Gül, Jia-Bin, Laura, Ludwig, Saining, Sara, and Shuran -- to answer these questions and more. This is an open forum to share insights, frustrations, and hacks — because no one builds a research career alone.
Workshop
Ilya Chugunov · Tzofi Klinghoffer · Shengyu Huang · Wenzheng Chen · Daniel Gilo · Akshat Dave · Lingjie Liu · David B. Lindell · Or Litany · Ramesh Raskar
[ 106 A ]
Abstract
Neural fields have been widely adopted for learning novel view synthesis and 3D reconstruction from RGB images by modelling transport of light in the visible spectrum. This workshop focuses on neural fields beyond conventional cameras, including (1) learning neural fields from data from different sensors across the electromagnetic spectrum and beyond, such as lidar, cryo-electron microscopy (cryoEM), thermal, event cameras, acoustic, and more, and (2) modelling associated physics-based differentiable forward models and/or the physics of more complex light transport (reflections, shadows, polarization, diffraction limits, optics, scattering in fog or water, etc.). Our goal is to bring together a diverse group of researchers using neural fields across sensor domains to foster learning and discussion in this growing area.
Workshop
Sukrut Rao · Indu Panigrahi · Sunnie S. Y. Kim · Vikram V. Ramaswamy · Rajat Sahay · Avinab Saha · Dahye Kim · Miguel-Ángel Fernández-Torres · Lenka Tětková · Teresa Dorszewski · Bartlomiej Sobieski · Marina Gavrilova · Yuhui Zhang · Pushkar Shukla
[ 107 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://xai4cv.github.io/workshop_cvpr25) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Explainability of computer vision systems is critical for people to effectively use and interact with them. This workshop provides a forum for researchers and practitioners to discuss the challenges and opportunities in explainable AI (XAI) for CV, addressing a critical need given the increasing deployment of these systems by: (1) initiating discussions across researchers and practitioners in academia and industry to identify successes, failures, and priorities in current XAI work; (2) examining the strengths, weaknesses, and underlying assumptions of proposed XAI methods and establish best practices in evaluation of these methods; and (3) discussing the various nuances of explainability and brainstorm ways to build explainable CV systems that benefit all involved stakeholders.
Workshop
Tobias Kirschstein · Simon Giebenhain · Tianye Li · Koki Nagano · Justus Thies · Matthias Nießner
[ 110 A ]
Abstract
Photorealistic 3D head avatars will play a crucial role in future computer games, visual effects, movie production, and virtual telepresence. In this workshop, we bring together leading academic researchers and industry experts to discuss the technology behind 3D head avatars, current applications, and future trends. In particular, we focus on two key desiderata of 3D head avatars: achieving the highest possible rendering quality and controlling the avatar with a driving signal. To this end, the workshop hosts a challenge on the NeRSemble 3D Head Avatar Benchmark. Challenge participants are invited to submit their methods for two tasks: dynamic novel view synthesis on heads, and monocular FLAME-driven 3D head avatar reconstruction. The authors of the best-performing submission will receive a GPU prize and present their method alongside invited speakers in the workshop.
Workshop
Rishabh Dabral
[ 110 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://humogen.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Ross Cutler · Julien Valentin · Justus Thies · Babak Naderi · Vishak Gopal
[ 212 ]
Abstract
Workshop
Eshed Ohn-Bar · Danna Gurari · Hernisa Kacorri · Kris Kitani · Chieko Asakawa · Jennifer Mankoff
[ 202 C ]
Abstract
The overarching goal of this workshop is to gather researchers, students, and advocates at the intersection of accessibility, computer vision, and autonomous systems. Building upon the success of the previous CVPR workshop (with cross-disciplinary talks, posters, and challenges), this iteration will focus on addressing the lack of shared development tools and vision-based benchmarks for accessibility systems. The workshop will feature a multimodal challenge with synthetic and real-world benchmarks. By fostering discussion and actively engaging people with disabilities, the workshop aims to build a stronger community for accessibility research within computer vision, uncover research opportunities, and encourage the development of more effective and usable real-world visual reasoning models.
Workshop
Fabio Bellavia · Jiri Matas · Dmytro Mishkin · Luca Morelli · fabio remondino · Amy Tabb · Eduard Trulls · Kwang Moo Yi
[ 108 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://image-matching-workshop.github.io) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Henghui Ding · Nikhila Ravi · Yunchao Wei · Jiaxu Miao · Zongxin Yang · Yi Yang · Si Liu · Yi Zhu · Elisa Ricci · Cees G. M. Snoek · Song Bai · Philip H.S. Torr
[ 105 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://pvuw.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Apratim Bhattacharyya · Fadime Sener · Roland Memisevic · Bugra Tekin · Edoardo Remelli · Shugao Ma · Guodong Ding · Shweta Mahajan · Angela Yao
[ 211 ]
Abstract
Workshop
Xiaohan Wang
[ 107 A ]
Abstract
Workshop
Anoop Cherian · Kuan-Chuan Peng · Suhas Lohit · Honglu Zhou · Le Xue · Kevin A. Smith · Tim Marks · Joshua B. Tenenbaum
[ 207 A-D ]
Abstract
In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.
Workshop
Guillermo Gallego · Kostas Daniilidis · Cornelia Fermuller · Davide Migliore · Daniele Perrone
[ Grand C2 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://tub-rip.github.io/eventvision2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The Event-based Vision Workshop at CVPR is the premier venue for discussing exciting new ideas about neuromorphic cameras and their processing methods. It covers the sensing hardware, as well as the processing, data, and learning methods needed to take advantage of event-based cameras. The workshop aims to highlight an emerging field with the potential to overcome many of the limitations of frame-based systems (speed, power consumption, robustness to HDR illumination, etc.). This forum fosters community building around these novel cameras, capitalizing on a growing interest and increasing contributions at the main conference. Furthermore, the workshop seeks to connect with a broader audience by highlighting interdisciplinary links between computer vision, robotics, artificial intelligence, computational neuroscience, and psychology, as event cameras facilitate research into replicating the efficiency and robustness of the human visual system.
Workshop
Yunzhi Zhang · Joy Hsu · Jiayuan Mao · R. Kenny Jones · Himanshu Singh Singh · Daniel Cohen-Or · Shangzhe Wu · Jiajun Wu
[ 101 A ]
Abstract
Visual concept discovery aims to extract compact and structured representations of the visual world, and recompose them to tackle novel intricate problems. It has played a crucial role in many core problems in computer vision research, including both discriminative and generative tasks. An important research question is to understand and design concept representations that facilitate better learning from various datasets and compositional reasoning. As an endeavor to answering this question, in this workshop, we gather together researchers in computer vision, multi-modal learning, machine learning, and cognitive science to discuss the development and interpretation of visual concept learning systems and their applications.
Workshop
Aayush Bansal · Minh Vo
[ 110 A ]
Abstract
Virtual Try-On (VTON) promises to transform the apparel e-commerce industry, offering benefits for shoppers, businesses, and the environment. This workshop will address three key challenges that must be overcome to realize VTON's full potential: achieving high-fidelity, rapid video try-ons; accurately predicting 3D garment size and improving 3D human body reasoning; and defining robust metrics for synthesis quality that avoid offensive results across diverse demographics. Addressing these VTON-specific challenges will necessitate fundamental advancements in generative image and video synthesis, offering broader impact within the computer vision and machine learning communities.
Workshop
Saurabh Prasad · Jocelyn Chanussot · Begüm Demir · Biplab Banerjee · Danfeng Hong
[ 209 A-C ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/morse2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
This workshop will feature keynotes and presentations at the cutting-edge of foundation models and large vision models for remote sensing. It will bring together researchers working on both foundation and large vision models and geospatial image analysis to address the nuances presented by using such emergent models for remotely sensed imagery (e.g. a multitude of sensors with different sensing characteristics/specifications, diverse imaging modalities, ranging from passive-optical multi/hyperspectral to active-imaging such as SAR and LiDAR; limited ground-reference data etc.). Our emphasis will range from large vision and foundation models that are showing promise in the computer vision community to foundation models that are pre-trained on large-quantities of earth-observation imagery. This workshop will provide a venue for the community to present works that push the envelope on adapting these models for effective inference of multi-sensor, multi-temporal, multi-scale earth observation imagery.
Workshop
Walter Zimmer · Ross Greer · Max Ronecker · Chuheng Wei · Haibao Yu · Rui Song · Xingcheng Zhou · Holger Caesar · Julie Stephany Berrio Perez · Alina Roitberg · Daniel Watzenig · Mohan Trivedi · Alois Knoll
[ Grand A2 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://drivex-workshop.github.io) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Luca Rossetto · George Awad · Werner Bailer · Cathal Gurrin · Björn Jónsson · Jakub Lokoč · Stevan Rudinac · Klaus Schoeffmann
[ 208 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/ivise2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Burhan Yaman · Yunsheng Ma · Xin Ye · Xu Cao · Wenqian Ye · Ana Jojic · Abhirup Mallik · Ziran Wang
[ 208 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://wdfm-ad.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The 1st Workshop on Distillation of Foundation Models for Autonomous Driving (WDFM-AD) aims to advance the deployment of large foundation models—such as vision-language models (VLMs) and generative AI (GenAI) models—within autonomous driving systems through efficient distillation techniques. Building on the momentum of prior workshops focused on large language and vision models for autonomous driving, WDFM-AD provides a dedicated platform for researchers and industry practitioners to explore methods that bridge cutting-edge foundation model research with real-world deployment, particularly under the stringent latency and resource constraints of autonomous vehicles. By addressing the challenges of compressing, aligning, and deploying foundation models for self-driving, WDFM-AD seeks to accelerate their safe, efficient, and scalable integration into next-generation autonomous driving systems.
Workshop
Amirhossein Habibian · Fatih Porikli · Auke Wiggers · Yung-Hsiang Lu · Vincent Tao Hu · Lanqing Guo · Qinghao Hu
[ 106 C ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/elvm/home) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
This workshop focuses on the core principles of efficiency in large-scale vision models. How do we minimize redundant operations in generative models without compromising quality? Can autoregressive decoding and diffusion sampling be accelerated through parallelization? What are the trade-offs between compression, quantization, and expressivity? We seek to advance new directions in compact model representations, adaptive computation, parallel decoding, and structured sparsity—approaches that go beyond incremental optimizations and redefine how LVMs operate.
<br>
<br>We invite researchers working on fast and scalable vision architectures, low-cost inference, and efficient generative models to share their insights. Whether through sampling acceleration, efficient transformers, new architectural paradigms, or theoretical limits of model compression, this workshop provides a platform to discuss how LVMs can be optimized for both performance and practicality.
<br>
<br>Join us in shaping the next generation of vision models—where efficiency is not just a constraint, but a driving force for innovation.
Workshop
Jianing "Jed" Yang · Shengyi Qian · Yining Hong · Valts Blukis · Xiaojian Ma · Yash Bhalgat · Iro Laina · Joyce Chai · David Fouhey
[ 106 A ]
Abstract
This workshop addresses a critical gap in current AI research by focusing on the integration of language and 3D perception, which is essential for developing embodied agents and robots, especially considering the recent rise of multimodal LLMs and vision-language-action (VLA) models.
<br>
<br>The workshop will explore challenges and opportunities in this area, providing a platform for researchers to share their work, discuss future directions, and foster collaboration across disciplines including robotics, computer vision, natural language processing, and human-computer interaction.
Workshop
Mei Chen · Dimitris N. Metaxas · Steve Finkbeiner · Oren Kraus
[ 214 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvmi-workshop.github.io/index.html) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Habib Slim · Xiang Li · Mahmoud Ahmed · Peter Vajda · Wolfgang Heidrich · Mohamed Elhoseiny · Natalia Neverova
[ 110 B ]
Abstract
Workshop
Heng Wang · Prithvijit Chattopadhyay · Ming-Yu Liu · Mike Zheng Shou · Jay Zhangjie Wu · Xihui Liu · Deepti Ghadiyaram · Gowthami Somepalli · Huaxiu Yao · Wenhu Chen · Jiaming Song · Humphrey Shi
[ 108 ]
Abstract
World models are predictive systems that enable Physical AI agents to understand, decide, plan, and analyze counterfactuals through integrated perception, instruction processing, controllability, physical plausibility, and future prediction capabilities. The past year has witnessed significant advancements from both academic and industrial research teams, with various models utilizing different conditioning approaches (text, image, video, control) being released openly and commercially. While these developments enable applications in content creation, autonomous driving, and robotics, the models' diversity in training methods, data sources, architecture, and input processing necessitates critical evaluation. The WorldModelBench workshop aims to address this need by fostering discussions on evaluation criteria (physical correctness, prompt alignment, generalizability), metrics development, standardized methodologies, and crucial topics including accessible benchmarking, quantitative evaluation protocols, downstream task assessment, and safety/bias considerations in world models.
Workshop
Yuankai Huo · Le Lu · Bennett Landman · Daniel Moyer · Jie Wu · Xiaoxiao Li · Chenyu You · Zhiyu Wan · Yucheng Tang · Nourhan Bayasi · Roza Bayrak
[ 101 C ]
Abstract
Workshop
Tianyuan Zhang · Siyang Wu · Aishan Liu · Jiakai Wang · Siyuan Liang · Felix Juefei-Xu · Qing Guo · Xinyun Chen · Yew-Soon Ong · Xianglong Liu · Dawn Song · Alan L. Yuille · Philip H.S. Torr · Dacheng Tao
[ 205 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvpr25-advml.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Riad I. Hammoud
[ 205 C ]
Abstract
Workshop
Zhao Dong · Zhaoyang Lv · Zhengqin Li · Jiajun Wu · Hao Su · Manmohan Chandraker · Kalyan Sunkavalli · Jia Deng · Shuang Zhao · Lingjie Liu · Jerome Revaud · Hong-Xing Yu · Yunzhi Zhang · Leonidas Guibas
[ 102 B ]
Abstract
Despite the growing momentum around 3D reconstruction and generative AI in computer vision, a critical gap remains: how to create photorealistic, fully functional 3D digital twins that are indistinguishable from their real-world counterparts and enable practical applications. This workshop tackles that challenge by spotlighting 3D digital twin creation technologies and their broad impact across AR/VR, spatial and contextual AI, and robotics. Distinguished speakers from diverse disciplines will share cutting-edge digital twin creation techniques and real-world use cases. Additionally, we are excited to launch a benchmark and challenge for 3D digital twin creation, built on our Digital Twin Catalog (DTC) dataset and supported by open-source baselines. This initiative aims to spark meaningful discussion, foster collaboration, and accelerate progress in both academic research and practical deployment.
Workshop
Khoi Nguyen · Anh Tran · Binh-Son Hua · Supasorn Suwajanakorn · Yi Zhou
[ 106 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://syntagen25.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Latha Pemula · Samet Akcay · Toby P. Breckon · Philipp Seeböck · Paul Bergmann · Paula Ramos-Giraldo · Yedid Hoshen · Guansong Pang · Jawad Tayyub · Thomas Brox
[ Davidson A1 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/vand30cvpr2025) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Anomaly detection—also known as novelty or out-of-distribution detection—is a key challenge in computer vision and pattern recognition. From medical imaging to industrial inspection, spotting what doesn’t belong is critical, yet notoriously hard. Why? Because anomalies can take unlimited forms, and most models see only a narrow slice of the possible "normal" during training.<br>The VAND workshop brings together cutting-edge research tackling this open-set problem across supervised, semi-supervised, and unsupervised methods, as well as few-, one-, and zero-shot approaches.<br>This year, we're also hosting two exciting challenges: (1) 'Adapt & Detect – Robust anomaly detection in real-world applications', and (2) 'VLM Anomaly Challenge – Few-shot learning for logical and structural anomaly detection using vision-language models'.<br>Join us to explore the next generation of models that can detect the unexpected.<br>
Workshop
Estefanía Talavera · Deblina Bhattacharjee · Mengwei Ren · Himangi Mittal · Karen Sanchez
[ 105 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/wicv-cvpr-2025/home) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Mennatullah Siam · Stella X. Yu · Sangwoo Mo · Leonid Sigal · Raoul de Charette · Tanzila Rahman · He Zhao · Aoran Xiao
[ 101 E ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/pixfoundation) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with …
Workshop
Hilde Kuehne · Rogerio Feris · Leonid Karlinsky · Anna Kukleva · Ameya Prabhu · Wei Lin · Muhammad Jehanzeb Mirza · Sivan Doveh · Roei Herzig
[ 207 A-D ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/mmfm3rdworkshop) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Fangyin Wei · Donglai Xiang · Qianli Ma · Yifei Li · Ming Lin · Chenfanfu Jiang · Shenlong Wang · David I.W. · Tsung-Yi Lin
[ 104 A ]
Abstract
This workshop explores the evolving intersection of computer vision and physics, where two competing perspectives—physics-based simulations versus data-driven approaches like video foundation models—seek to model the world effectively. By bringing together researchers from both fields, the event aims to foster collaboration, identify synergies, and advance applications in scientific research, generative AI, robotics, gaming, and extended realities (XR). Through presentations and discussions, the workshop will promote interdisciplinary dialogue to develop next-generation technologies that combine physics-based and data-driven methods, ultimately enhancing realistic simulations for immersive environments, automated tasks, and seamless virtual-physical integration.
Workshop
Mohamed Lakhal · Ozge Mercanoglu Sincan · Edward Fish · Harry Walsh · Gul Varol · Liliane Momeni · Necati Cihan Camgoz · Neil Fox · Kearsy Cormier · Bencie Woll · Richard Bowden
[ Davidson C1-C2 ]
Abstract
Sign languages are visual languages and a key form of communication for deaf communities. Thanks to recent advances in deep learning and computer vision and the availability of larger datasets, significant progress has been made in sign language technologies. Following the first and second editions, this workshop is motivated by the desire to broaden participation in sign language research from the computer vision community. It aims to bring together researchers working on different aspects of vision-based sign language research and sign language linguists to explore recent advances and future directions in sign language recognition, translation, and production.
<br>
<br>Please visit our schedule page for details: [https://slrtpworkshop.github.io/schedule/](https://slrtpworkshop.github.io/schedule/)
Workshop
Siddhant Bansal · Antonino Furnari · Tushar Nagarajan · Dima Damen · Giovanni Maria Farinella · Kristen Grauman · Jitendra Malik · Richard Newcombe · Marc Pollefeys · Yoichi Sato · David Crandall
[ Grand B1 ]
Abstract
Egocentric devices like wearable cameras, smart glasses, and AR/VR headsets are rapidly evolving to automatically recognize user actions, environments, gestures, and social interactions. This workshop serves as a central gathering point for the egocentric vision community to exchange ideas and explore this fast-growing field. It features challenges across five major datasets (EPIC-Kitchens, Ego4D, Ego-Exo4D, HoloAssist, HD-EPIC), keynote talks from leading experts, abstract presentations on emerging ideas, EgoVis award to seminal papers from 2023/2024, and poster sessions on pivotal papers—offering a comprehensive look at the future of egocentric perception and wearable AI.
Workshop
Kwan-Yee Lin · Wayne Wu · Bolei Zhou · Matthias Nießner · Stella X. Yu
[ 101 B ]
Abstract
This workshop aims to explore the pathway toward building “Embodied Humans”—intelligent humanoid agents capable of both physical action and cognitive reasoning like humans—where the boundary between digital avatars and physical humanoid robots could be dissolved through their co-evolution across virtual and real worlds. We will examine this synergy‘s possibility through three core dimensions: 1) how humanoid robots learning foundational “genes” from avatars? 2) how virtual humans gain physical plausibility from robots‘ embodiment to enrich realism and interactivity? and 3) how both systems develop self-autonomy to perceive, plan, and act in dynamic, open-ended environments? Featuring academic researchers and industry experts as invited speakers and panelists, the workshop brings together perspectives from virtual avatar modeling and humanoid robot learning to explore how systems on both ends are progressing toward human-like capacities for perception, reasoning, and movement. Through advanced techniques—such as reinforcement learning, cognition modeling, motion and structure perception, geometric representations, multimodal simulation, and large language/vision/action models—we aim to understand how virtual humans are evolving beyond surface-level realism, and how humanoid robots are advancing beyond pre-scripted skills—enabling both to engage the world with situational understanding, behavioral adaptability, and autonomous intent. At the heart of this workshop lie two essential questions: What makes a …
Workshop
Angela Dai · Yueh-Cheng Liu · Chandan Yeshwanth · Ben Mildenhall · Peter Kontschieder · Matthias Nießner
[ 211 ]
Abstract
Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces. The ScanNet++ workshop offers the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.
Workshop
Xi Wang · Xianghui Xie · Nikos Athanasiou · Dimitrios Tzionas · Shashank Tripathi · Bharat Lal Bhatnagar · Alexey Gavryushin · Thiemo Alldieck · Muhammed Kocabas · Luc Van Gool · Marc Pollefeys · Gerard Pons-Moll
[ 212 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://rhobin-challenge.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Hang Su · Yinpeng Dong · Jindong Gu · Yichi Zhang · Cihang Xie · Lingjuan Lyu · Jun Zhu · Philip H.S. Torr · Shiguang Shan · Wanli Ouyang
[ 109 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://viscale.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Marina Paolanti · Roberto Pierdicca · fabio remondino · Livio De Luca
[ 210 ]
Abstract
Workshop
James Tompkin · Deqing Sun · Lu Jiang · Lingjie Liu · Fitsum Reda · Jun-Yan Zhu · Krishna Kumar Singh
[ Grand A1 ]
Abstract
AI for content creation plays a crucial role in domains such as photography, videography, virtual reality, gaming, art, design, fashion, and advertising, and lies at the intersection of computer vision, machine learning, computer graphics, and design. This workshop will provide attendees with a slice of cutting-edge techniques within this rapidly evolving field, considering both the fundamental technologies and practical challenges faced by designers and content creators, and will show successful applications of AI and deep learning in content creation. With invited speakers of world-class expertise in content creation, up-and-coming researchers, and posters from authors of submitted workshop papers, the workshop will help all to engage in a day filled with learning, discussion, and network building.
Workshop
Claudia DArpino · Anthony G Francis · Cem Gokmen · Changan Chen · Chengshu Li · Angel Xuan Chang · David Hall · German Ros · Joel Jang · Lamberto Ballan · Luca Weihs · Mike Roberts · Minyoung Hwang · Oleksandr Maksymets · Rachith Prakash · Ram Ramrakhya
[ 101 D ]
Abstract
The Sixth Annual Embodied AI Workshop brings together researchers from computer vision, language, graphics and robotics to share the latest advances in embodied intelligent agents that see, talk, listen, reason, and act in bodies within interactive environments. This year's workshop focuses on Real World Applications, with topics including Embodied AI Solutions, Advances in Simulation, Generative Methods, and Foundation Models. The workshop will feature invited talks, a poster session, and panel discussions. Also, the sixth iteration of the workshop continues its tradition of highlighting several embodied AI challenges that advance the state of the art in the field.
Workshop
Timo Sämann · Oliver Wasenmüller · Markus Enzweiler · Peter Schlicht · Stefan Milz · Thomas Stauner · Joachim Sicking · Claus Bahlmann
[ 104 D ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/saiad-2025/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Perampalli Shravan Nayak · Mehar Bhatia · Qian Yang · Kanishk Jain · Rabiul Awal · David Adelani · Spandana Gella · Siva Reddy · Vered Shwartz · Yash Goyal · Sjoerd Steenkiste · Karolina Stanczak · Aishwarya Agrawal
[ 104 E ]
Abstract
The CVPR community has long focused on evaluating AI systems for their general scene-understanding capabilities. However, as these models are deployed globally, it is essential that they also understand cultural concepts and values, ensuring they cater to the diverse needs of users. This workshop expands computer vision frontiers by bringing together researchers from computer vision, natural language processing, AI ethics, and cultural anthropology to discuss how we can build geo-diverse and culturally aware vision-language models (or AI models in general). Specifically, the workshop will focus on evaluating the types of tasks, benchmarks, and metrics we should develop to advance AI systems' capabilities in this area and explore promising approaches to overcome the challenges. Second, the workshop will benchmark progress in geo-diverse and cultural understanding of vision-language models through the CulturalVQA and GlobalRG challenges, which will test critical abilities such as visual question answering and grounding in culturally diverse scenarios. The insights from this workshop extend beyond computer vision, with significant implications for fields like healthcare, education, and e-commerce, where culturally aligned AI can enhance user experiences. Additionally, the workshop aims to inspire further research in AI ethics, fairness, and responsible AI deployment.
Workshop
Grigorios G. Chrysos · Aggelina Chatziagapi · Blerina Gkotse · Vikas Singh · Sanmi Koyejo · Philip H.S. Torr · Matthew B. Blaschko
[ 107 A ]
Abstract
The shift towards foundation models has overshadowed the unique insights of deep learning theory, resulting in a loss of valuable knowledge and resources for the community. As machine learning and computer vision extend into new domains, such as biology, a deeper understanding of vision tasks becomes increasingly important. This workshop will provide a crucial platform for discussing the systematic challenges of integrating theory and practice. Concretely, to bridge the gap between theoretical research in machine learning and its practical applications, the workshop aims to explore how theoretical tools can be leveraged to perform rigorous worst-case analysis, crucial for deploying machine learning models in safety-critical societal domains like healthcare, education, and sustainability.
Workshop
Mike Zheng Shou · Yiqi Lin · Joya Chen · Ziyun Zeng · Linchao Zhu · Gedas Bertasius · Md Mohaiminul Islam · Gaoang Wang · Wei Li · Matt Feiszli · Lorenzo Torresani · Kristen Grauman · Jitendra Malik
[ 105 A ]
Abstract
Workshop
Tianhong Li · Yilun Xu · Tianyuan Zhang · Tim Dockhorn · Shuang Li · Arash Vahdat · Kaiming He
[ 103 A ]
Abstract
In recent years, diffusion models have rapidly overtaken previous methods to become the dominant approach in visual generative modeling, with widespread applications in generating images, videos, 3D objects, and more. However, these models also come with notable limitations, such as slow generation speeds, limited human intervention during the generation process, and challenges in modeling complex distributions like long videos.<br><br>This year, our Visual Generative Modeling workshop at CVPR aims to explore what lies beyond diffusion models in visual generative modeling. We will discuss novel insights, alternative approaches, and new possibilities in modeling and generating visual data. Join us for a full-day event featuring keynote talks from both academia and industry -- all designed to ignite innovative ideas and novel research in visual generative modeling.
Workshop
Haibao Yu · Jianing Qiu · Yao Mu · Jiankai Sun · Li Chen · Walter Zimmer · Jiaru Zhong · Dandan Zhang · Fei Gao · Shanghang Zhang · Mac Schwager · Ping Luo · Zaiqing Nie · Tianxing Chen · Wenxian Yang · Ruiyang Hao · Chuanye Wang · Jiahao Wang · Siqi Fan
[ 102 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://coop-intelligence.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Tamar Rott Shaham · Yossi Gandelsman · Joanna Materzynska · Rohit Gandikota · Amil Dravid · Ashkan Khakzar · Eli Shechtman · Philip H.S. Torr
[ Grand C1 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/miv-cvpr2025/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Leandra Brickson · Urs Waldmann · Shangzhe Wu · Anna Zamansky · Gengshan Yang · Bastian Wandt · Garvita Allabadi · George Martvel · Andre Telfer · Xiaoxuan Ma
[ 104 C ]
Abstract
Many biological organisms have evolved to exhibit diverse behaviors, and understanding these behaviors is a fundamental goal of multiple disciplines including neuroscience, biology, animal husbandry, ecology, and animal conservation. These analyses require objective, repeatable, and scalable measurements of animal behaviors that are not possible with existing methodologies that leverage manual encoding from animal experts and specialists. Recently, computer vision has been making a significant impact across multiple disciplines by providing new tools for the detection, tracking, and analysis of animal behavior. This workshop brings together experts across fields to stimulate this new field of computer-vision-based animal behavioral understanding.
Workshop
Dawid Rymarczyk · Ilknur Icke · Adriana Borowa · Ana Sanchez-Fernandez · Chao-hui Huang · Anne Carpenter
[ 103 C ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvdd-cvpr25.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The workshop aims to bridge the gap between computer vision, artificial intelligence, and the life sciences, with a focus on transformative advancements in drug discovery. By integrating innovative imaging modalities—such as Spatial Transcriptomics, Cell Painting, and Optical Pooled Screening—with state-of-the-art computer vision techniques, this workshop seeks to foster collaboration between experts in biomedical science, AI, and computer vision._x000D_<br>_x000D_<br>The workshop highlights the potential for revolutionizing drug discovery processes, driving faster and more accurate identification of therapeutic targets, and expediting the development of treatments for complex diseases. Addressing pressing challenges like cancer, neurodegenerative disorders, and pandemics, the focus lies on leveraging AI to analyze high-dimensional biological data, enhancing our understanding of disease mechanisms and responses to therapies._x000D_<br>_x000D_<br>For the CVPR community, this represents an exciting opportunity to expand beyond traditional image processing tasks into applications with tangible societal impact. By applying computer vision expertise to critical healthcare and pharmaceutical challenges, participants will engage with tasks …
Workshop
Yufei Ye · Homanga Bharadhwaj · Dandan Shan · Wei-Chiu Ma · Shubham Tulsiani · Abhinav Gupta · Michael J. Black
[ 213 ]
Abstract
Workshop
Khoa Luu · Nemanja Djuric
[ 107 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/ieeecvf-cvpr2025-precognition) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.<br><br>One example application is in autonomous driving technologies, where vision-based understanding of a traffic scene and prediction of movement of traffic actors is a critical piece of the autonomous puzzle. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been the focus of new theoretical studies and practical …
Workshop
Felix Juefei-Xu · Tingbo Hou · Yang Zhao · Licheng Yu · Zhisheng Xiao · Xiaoliang Dai · Qifei Wang · Tao Xu · Yanwu Xu · Ali Thabet · Qiang Liu · Xuan Ju · Ruiqi Gao · Xi Yin · Haolin Jia · Xide Xia · Peizhao Zhang · Peter Vajda
[ 208 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvpr25-edge.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Akshay Gadi Patil · Vidya Narayanan · Haoye Dong · Gerard Pons-Moll · Ming Lin
[ 105 B ]
Abstract
Workshop
Weike Ye · Santosh Suram · Helge Stein · Mathew Cherukara · Jiarui Zhang
[ 110 A ]
Abstract
Workshop
Vage Taamazyan · Aarrushi Shandilya · Agastya Kalra · Huaijin Chen · Krzysztof Choromanski · Martin Sundermeyer · Phil Nelson · Satya Mallick · Stan Birchfield · Tim Salzmann · Tomas Hodan · Yang Qian
[ 210 ]
Abstract
This workshop addresses the gap between cutting-edge computer vision research and its practical application in industrial robotics, specifically addressing challenges in tasks like reliable, scalable, and cost-effective bin picking. The workshop brings together researchers and practitioners to discuss topics including 3D scene understanding, embodied AI, and robot learning, focusing on developing robust solutions by considering factors like embodiment, camera choice, and data needs. Complementing the workshop, the Perception Challenge for Bin Picking offers a practical platform for participants to tackle real-world 6DoF pose estimation problems using a robot-in-the-loop evaluation, providing a more realistic performance assessment than traditional vision-only metrics. The workshop and challenge together aim to accelerate the adoption of vision-guided robotics and enhance industrial automation efficiency.
Workshop
Rikke Gade · Anthony Cioppa · Thomas B. Moeslund · Graham Thomas · Adrian Hilton · Jim Little · Michele Merler · Silvio Giancola
[ 108 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://vap.aau.dk/cvsports/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Sports is said to be the social glue of society. It allows people to interact irrespective of their social status, age etc. With the rise of the mass media, a significant quantity of resources has been channeled into sports in order to improve understanding, performance, and presentation. For example, areas like performance assessment, which were previously mainly of interest to coaches and sports scientists are now finding applications in broadcast and other media, driven by the increasing use of on-line sports viewing which provides a way of making all sorts of performance statistics available to viewers. Computer vision has recently started to play an important role in sports as seen in for example football where computer vision-based graphics in real-time enhances different aspects of the game. Computer vision algorithms have a huge potential in many aspects of sports ranging from automatic annotation of broadcast footage, through to better understanding of sport injuries, coaching, and enhanced viewing. …
Workshop
Dimitrios Kollias · Stefanos Zafeiriou · Irene Kotsia · Panagiotis Tzirakis · Eric Granger · Simon Bacon · Marco Pedersoli · Alan Cowen
[ 202 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://affective-behavior-analysis-in-the-wild.github.io/8th) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The ABAW Workshop is a premier platform highlighting the latest advancements in multimodal analysis, generation, modeling, and understanding of human affect and behavior in real-world, unconstrained environments. It emphasizes cutting-edge systems that integrate facial expressions, body movements, gestures, natural language, voice and speech to enable impactful research and practical applications. The workshop fosters interdisciplinary collaboration across fields such as computer vision, AI, human machine interaction, psychology, robotics, ethics & healthcare. The workshop further addresses complex challenges like algorithmic fairness, demographic bias & data privacy, making it a vital forum for building equitable, generalizable & human-centered AI systems. By uniting experts from academia, industry & government, the workshop promotes innovation, drives knowledge exchange, and inspires new directions in affective computing, behavior modelling and understanding & human-computer interaction. Finally, the Workshop includes a Competition with 6 challenges, including valence-arousal estimation, basic & compound expression recognition, action unit detection, emotional mimicry intensity estimation and ambivalence/hesitancy recognition.
Workshop
Qixing Huang · Congyue Deng · Lin Gao · Hanwen Jiang · Lingjie Liu · Biao Zhang · Ruqi Huang · Anand Bhattad · Roni Sengupta · Despoina Paschalidou
[ Davidson C3 ]
Abstract
Workshop
Ozgur Kara · Fabian Caba Heilbron · Anyi Rao · Victor Escorcia · Ruihan Zhang · Mia Tang · Dong Liu · Maneesh Agrawala · James Rehg
[ 207 A-D ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cveu.github.io/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Visual content creation is booming, yet producing engaging visual content remains a challenging task. This workshop aims to highlight machine learning technologies that accelerate and enhance creative processes in visual content creation and editing, including image animation, text-to-visual content generation, and content translation. Moreover, we believe that advancing technology to better understand edited visual content can enable novel platforms for creating compelling media. We seek to bridge the gap between technical and creative communities by bringing together researchers from computer vision, graphics, and the arts, fostering interdisciplinary collaboration and exploring opportunities in this under-explored area.
Workshop
Muhammad Haris Khan · Biplab Banerjee
[ 211 ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://cvpr25workshop.m-haris-khan.com) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Chris Padwick · Naira Hovakimyan · Jing Wu · Kai Wang
[ 107 B ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://www.agriculture-vision.com/) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
With the recent success of computer vision and deep learning in various applications, there has been significantly increasing attention towards its use in agriculture, presenting both significant economic and social opportunities. This The 6th Annual International Workshop And Prize Challenge on Agriculture-Vision aims to foster research and applications at the intersection of computer vision and agriculture, addressing challenges in real-world agricultural scenarios, with a strong record from prior editions at CVPR 2020-2024. The workshop will feature a computer vision challenge, and invited speakers from diverse academic and industry backgrounds including computer vision, robotics, agriculture, and top industry practitioners. This event provides a platform to showcase current progress in interdisciplinary areas and encourage further research and development of Foundation Models In Agriculture.
Workshop
Francis Engelmann · Ayça Takmaz · Jonas Schult · Alexandros Delitzas · Elisabetta Fedele · Zuria Bauer · katerina adam · Or Litany · Federico Tombari · Marc Pollefeys · Leonidas Guibas
[ 105 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://opensun3d.github.io/index.html) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Viraj Prabhu · Prithvijit Chattopadhyay · Sriram Yenamandra · Hao Liang · Krish Kabra · Guha Balakrishnan · Judy Hoffman · Pietro Perona
[ 208 B ]
Abstract
With the increasing adoption of machine learning models in high-stakes applications, rigorous audits of model behavior have assumed paramount importance. However, traditional auditing methods fall short of being truly experimental, as they rely on wild-caught observational data that has been manually labeled. Enter generative techniques, which have recently shown impressive capabilities in automatically generating and labeling high-quality synthetic data at scale. Critically, many such methods allow for the isolation and manipulation of specific attributes of interest, paving the path towards robust experimental analysis. _x000D_<br>_x000D_<br>_x000D_<br>This workshop is dedicated to exploring techniques for auditing the behavior of machine learning models – including (but not limited) to performance, bias, and failure modes – by the controlled synthesis (via generation or simulation) of data. Of special interest are algorithms for generating data (images, text, audio, etc.) and benchmarking that provide reliable insights into model behavior by minimizing the impact of potential confounders. We also welcome work on the broader topic of using synthetic or quasi-synthetic data for model debugging, broadly construed, with the goal of providing a venue for interdisciplinary exchange of ideas on this emerging topic.
Workshop
Danna Gurari
[ Davidson C1-C2 ]
Abstract
Workshop
Gedas Bertasius · Rohit Girdhar · Zhiding Yu · Lucas Beyer · Gul Varol · Alaaeldin El-Nouby · Tyler Zhu · Feng Cheng · Yan-Bo Lin · Md Mohaiminul Islam · Yi-Lin Sung · Jaemin Cho · Ce Zhang · Yue Yang · Ziyang Wang · Mohit Bansal · Shilong Liu · Hao Zhang · Fuxiao Liu · Xiaolong Li · Subhashree Radhakrishnan · Shiyi Lan · Jose M. Alvarez
[ 209 A-C ]
Abstract
Workshop
Qianli Ma · Siwei Zhang · Shashank Tripathi · Rawal Khirodkar · Yan Zhang · Yao Feng · Georgios Pavlakos · Siyu Tang
[ 110 B ]
Abstract
Workshop
Jian Zhao · Jianan Li · Lei Jin · Miguel Bordallo Lopez · Liang Li · Xinyi Ying · Tianyang Xu · Yawen Cui · SADAF GULSHAD · Shin’ichi Satoh · Hongyuan Zhang · Jianshu Li · Jiaojiao Zhao · Zhi-Qi Cheng · Mengmi Zhang · Zaiping Lin · Miao Li · Zheng Wang · Zechao Li · Yunchao Wei · Junliang Xing · shen Jane · Qi Wang · Xuelong Li
[ Davidson A1 ]
Abstract
Workshop
Adriana Romero-Soriano · Reyhane Askari · Melissa Hall · Michal Drozdzal · Ye Zhu · Agata Lapedriza · Arantxa Casanova · Negar Rostamzadeh · Utsav Prabhu · Pinar Yanardag
[ 106 A ]
Abstract
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the [workshop's website](https://sites.google.com/view/cvpr-responsible-genai) to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop
Wayne Wu · Bolei Zhou · Katerina Fragkiadaki
[ Davidson A2-A3 ]
Abstract
Workshop
Anagh Malik · Benjamin Attal · Dor Verbin · Xuaner Zhang · James Tompkin
[ 106 C ]
Abstract
3D computer vision has become fundamental to technologies ranging from medical imaging to astronomy and from AR/VR to embodied intelligence. New sensors and imaging modalities like structured-light, time-of-flight, and light field microscopy are being developed to make 3D vision more tractable; but even with new types of sensor data, many problems in 3D vision tend to be ill-posed and hence to solve them we often rely on heuristics or data-driven priors. Unfortunately, these priors can fail in certain cases, especially for problems where ground truth data is not available, or for niche sensors where capturing large datasets is not feasible. A promising, but often overlooked, alternative is to incorporate knowledge of physics (e.g. physical light transport) into 3D computer vision algorithms, which can better constrain the solutions that they produce.<br><br>The goal of this workshop is to highlight work in 3D computer vision and imaging that makes use of physics-inspired modeling and physical-priors, showcasing their importance even with the prevalence of neural priors and big data. Examples include methods that apply physics-based approaches to inverse rendering, 3D microscopy, tomography, and light-in-flight imaging; or methods that combine such approaches with novel tools like neural radiance fields (NeRFs), 3D Gaussian Splatting (3DGS), …
Workshop
Matteo Poggi · Fabio Tosi · Ripudaman Singh Arora · Anton Obukhov · Jaime Spencer · Chris Russell · Simon Hadfield · Richard Bowden
[ 109 ]
Abstract