Registration Desk: Registration / Badge Pickup Thu 12 Jun 07:00 a.m.
Workshop: Three things everyone should ask about photorealistic virtual try-on. Thu 12 Jun 08:00 a.m.
Virtual Try-On (VTON) promises to transform the apparel e-commerce industry, offering benefits for shoppers, businesses, and the environment. This workshop will address three key challenges that must be overcome to realize VTON's full potential: achieving high-fidelity, rapid video try-ons; accurately predicting 3D garment size and improving 3D human body reasoning; and defining robust metrics for synthesis quality that avoid offensive results across diverse demographics. Addressing these VTON-specific challenges will necessitate fundamental advancements in generative image and video synthesis, offering broader impact within the computer vision and machine learning communities.
Workshop on Distillation of Foundation Models for Autonomous Driving Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The 1st Workshop on Distillation of Foundation Models for Autonomous Driving (WDFM-AD) aims to advance the deployment of large foundation models—such as vision-language models (VLMs) and generative AI (GenAI) models—within autonomous driving systems through efficient distillation techniques. Building on the momentum of prior workshops focused on large language and vision models for autonomous driving, WDFM-AD provides a dedicated platform for researchers and industry practitioners to explore methods that bridge cutting-edge foundation model research with real-world deployment, particularly under the stringent latency and resource constraints of autonomous vehicles. By addressing the challenges of compressing, aligning, and deploying foundation models for self-driving, WDFM-AD seeks to accelerate their safe, efficient, and scalable integration into next-generation autonomous driving systems.
C3DV: 3rd Workshop on Compositional 3D Vision Thu 12 Jun 08:00 a.m.
Tutorial: Orhun Aydin
Geospatial Computer Vision and Artificial Intelligence for Large-Scale Earth Observation Data
Earth observation (EO) data has applications in agriculture, disaster management, and security. This tutorial explores integrating CV and EO data using diverse sensing types. Attendees will learn about open-source tools, multimodal reasoning, geospatial foundation models, and hands-on analysis of EO data for environmental and climate monitoring.
Bio :Tutorial: Sean Fanello
Sense, Perceive, Interact & Render on Android XR
This tutorial details the perception stack built for Android XR, including head, hand, face, and eye tracking. It covers data capture, rendering, photorealistic avatars, and scene understanding. Use cases highlight the stack's application in immersive and interactive experiences.
Bio :Tutorial: Srikumar Ramalingam
Efficient Text-to-Image/Video modeling
We are witnessing groundbreaking results in image-to-text and image-to-video models. However, the generation process with these models is iterative and computationally expensive. There is a growing need to make these algorithms faster for serving millions of users efficiently. This course focuses on techniques such as progressive parallel decoding, distillation methods, and Markov Random Fields to accelerate text-to-image and text-to-video models. The course also critiques popular evaluation techniques like FID and introduces efficient alternatives such as CMMD.
Bio :WorldModelBench: The First Workshop on Benchmarking World Foundation Models Thu 12 Jun 08:00 a.m.
World models are predictive systems that enable Physical AI agents to understand, decide, plan, and analyze counterfactuals through integrated perception, instruction processing, controllability, physical plausibility, and future prediction capabilities. The past year has witnessed significant advancements from both academic and industrial research teams, with various models utilizing different conditioning approaches (text, image, video, control) being released openly and commercially. While these developments enable applications in content creation, autonomous driving, and robotics, the models' diversity in training methods, data sources, architecture, and input processing necessitates critical evaluation. The WorldModelBench workshop aims to address this need by fostering discussions on evaluation criteria (physical correctness, prompt alignment, generalizability), metrics development, standardized methodologies, and crucial topics including accessible benchmarking, quantitative evaluation protocols, downstream task assessment, and safety/bias considerations in world models.
Tutorial: Chris Padwick
Multi-Modal Computer Vision and Foundation Models In Agriculture in conjunction with IEEE CVPR 2025
With the recent success of computer vision and deep learning in various applications, there has been significantly increasing attention paid to its use in agriculture. Agriculture-related vision problems are of great economic and social value. For example, robotics has recently been reinvigorated with work on Vision-Language-Action models. Building on these successes, researchers are using multi-modal computer vision foundation models to make progress on agricultural tasks and topics. Some relevant examples include: 1) Agricultural models that leverage data from different remote sensing platforms; 2) Multi-temporal yield prediction models using unsupervised domain adaptation; 3) Multi-modal models for identifying pests and weeds. This tutorial will encourage research in ML, CV, and agriculture, featuring leading researchers discussing the evolution and trends in this field.
Bio :5th International Workshop on Event-based Vision Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The Event-based Vision Workshop at CVPR is the premier venue for discussing exciting new ideas about neuromorphic cameras and their processing methods. It covers the sensing hardware, as well as the processing, data, and learning methods needed to take advantage of event-based cameras. The workshop aims to highlight an emerging field with the potential to overcome many of the limitations of frame-based systems (speed, power consumption, robustness to HDR illumination, etc.). This forum fosters community building around these novel cameras, capitalizing on a growing interest and increasing contributions at the main conference. Furthermore, the workshop seeks to connect with a broader audience by highlighting interdisciplinary links between computer vision, robotics, artificial intelligence, computational neuroscience, and psychology, as event cameras facilitate research into replicating the efficiency and robustness of the human visual system.
Tutorial: Lukas Picek
This tutorial introduces the field of individual animal re-identification (ReID), crucial for ecological monitoring, conservation, and ethical wildlife research. Accurate animal ReID supports long-term monitoring of endangered species, combatting poaching, and understanding animal behavior. This half-day hybrid tutorial includes multiple talks and a panel discussion to encourage interaction and research directions.
Bio :Workshop: DriveX - Foundation Models for V2X-Based Cooperative Autonomous Driving Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
1st International Workshop on Interactive Video Search and Exploration (IViSE) Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop on Foundation and Large Vision Models in Remote Sensing Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
This workshop will feature keynotes and presentations at the cutting-edge of foundation models and large vision models for remote sensing. It will bring together researchers working on both foundation and large vision models and geospatial image analysis to address the nuances presented by using such emergent models for remotely sensed imagery (e.g. a multitude of sensors with different sensing characteristics/specifications, diverse imaging modalities, ranging from passive-optical multi/hyperspectral to active-imaging such as SAR and LiDAR; limited ground-reference data etc.). Our emphasis will range from large vision and foundation models that are showing promise in the computer vision community to foundation models that are pre-trained on large-quantities of earth-observation imagery. This workshop will provide a venue for the community to present works that push the envelope on adapting these models for effective inference of multi-sensor, multi-temporal, multi-scale earth observation imagery.
Workshop: Efficient Large Vision Models Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
This workshop focuses on the core principles of efficiency in large-scale vision models. How do we minimize redundant operations in generative models without compromising quality? Can autoregressive decoding and diffusion sampling be accelerated through parallelization? What are the trade-offs between compression, quantization, and expressivity? We seek to advance new directions in compact model representations, adaptive computation, parallel decoding, and structured sparsity—approaches that go beyond incremental optimizations and redefine how LVMs operate.
We invite researchers working on fast and scalable vision architectures, low-cost inference, and efficient generative models to share their insights. Whether through sampling acceleration, efficient transformers, new architectural paradigms, or theoretical limits of model compression, this workshop provides a platform to discuss how LVMs can be optimized for both performance and practicality.
Join us in shaping the next generation of vision models—where efficiency is not just a constraint, but a driving force for innovation.
Workshop on 3D-LLM/VLA: Bridging Language, Vision and Action in 3D Environments Thu 12 Jun 08:00 a.m.
This workshop addresses a critical gap in current AI research by focusing on the integration of language and 3D perception, which is essential for developing embodied agents and robots, especially considering the recent rise of multimodal LLMs and vision-language-action (VLA) models.
The workshop will explore challenges and opportunities in this area, providing a platform for researchers to share their work, discuss future directions, and foster collaboration across disciplines including robotics, computer vision, natural language processing, and human-computer interaction.
Tutorial: Nadine Chang
Continuous Data Cycle via Foundation Models
Foundation models are being continuously integrated into applications like autonomous driving and diagnostics. This tutorial explores the data-model feedback loop: how foundation models affect data curation and vice versa. Talks cover leveraging foundation models to build efficient data engines, enhancing model performance, and addressing data relevance, scale, and quality.
Bio :Workshop on Visual Concepts Thu 12 Jun 08:00 a.m.
Visual concept discovery aims to extract compact and structured representations of the visual world, and recompose them to tackle novel intricate problems. It has played a crucial role in many core problems in computer vision research, including both discriminative and generative tasks. An important research question is to understand and design concept representations that facilitate better learning from various datasets and compositional reasoning. As an endeavor to answering this question, in this workshop, we gather together researchers in computer vision, multi-modal learning, machine learning, and cognitive science to discuss the development and interpretation of visual concept learning systems and their applications.
Workshop on Computer Vision for Microscopy Image Analysis Thu 12 Jun 08:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Tutorial: Vishnu Naresh Boddeti
Computer Vision over Homomorphically Encrypted Data
Over the past decade, computer vision (CV) systems have become integral to healthcare, surveillance, and personal devices. The sensitive nature of data and models raises privacy concerns. Fully homomorphic encryption (FHE) allows computations on encrypted data, ensuring privacy. This tutorial explores integrating FHE into CV, addressing its challenges, mathematical foundations, key FHE schemes, SIMD capabilities, and hands-on demonstrations. It covers private and encrypted CV tasks and discusses open research directions.
Bio :Tutorial: Fabricio Narcizo
Edge AI in Action: Technologies and Applications
Edge AI in Action is a hands-on tutorial exploring practical tools to develop and deploy AI models on resource-constrained devices. Topics include model optimization, deployment of LLMs and CV models, and integration with cloud-edge architectures. Demonstrations include devices like Raspberry Pi, iPhones, and Androids. Attendees will gain actionable insights into real-world Edge AI.
Bio :Tutorial: Chonghao Sima
Robotics 101: An Odyssey from A Vision Perspective
This full-day tutorial offers a vision-focused introduction to robotics. It covers foundational background, technical advancements, key challenges, and emerging directions. With diverse speakers from multiple domains, the tutorial is divided into two sessions: 'Perceive the World' and 'Interact with the World', addressing perception and interaction in robotics.
Bio :Tutorial: Viktoria Ehm
3D Shape Analysis: From Classical Optimization to Learning-based Matching
3D shape analysis deals with extracting information from geometric data, with applications in driving, biomedicine, and AR/VR. This tutorial covers classical shape matching methods (linear and quadratic assignment problems), product graph formalisms, learning-based correspondence, spectral methods, and real-world applications. Challenges and future directions are also addressed.
Bio :11th Workshop on Medical Computer Vision Thu 12 Jun 08:15 a.m.
PixFoundation: Workshop on Pixel-level Vision Foundation Models Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features seven invited talks, mixing emerging and established researchers, along with posters and selective spotlight presentations.
The 5th Workshop of Adversarial Machine Learning on Computer Vision: Foundation Models + X Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Women in Computer Vision Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: 3D Digital Twin: Progress, Challenges, and Future Directions Thu 12 Jun 08:30 a.m.
Despite the growing momentum around 3D reconstruction and generative AI in computer vision, a critical gap remains: how to create photorealistic, fully functional 3D digital twins that are indistinguishable from their real-world counterparts and enable practical applications. This workshop tackles that challenge by spotlighting 3D digital twin creation technologies and their broad impact across AR/VR, spatial and contextual AI, and robotics. Distinguished speakers from diverse disciplines will share cutting-edge digital twin creation techniques and real-world use cases. Additionally, we are excited to launch a benchmark and challenge for 3D digital twin creation, built on our Digital Twin Catalog (DTC) dataset and supported by open-source baselines. This initiative aims to spark meaningful discussion, foster collaboration, and accelerate progress in both academic research and practical deployment.
Workshop: VAND: Visual Anomaly and Novelty Detection - 3rd Edition Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Anomaly detection—also known as novelty or out-of-distribution detection—is a key challenge in computer vision and pattern recognition. From medical imaging to industrial inspection, spotting what doesn’t belong is critical, yet notoriously hard. Why? Because anomalies can take unlimited forms, and most models see only a narrow slice of the possible "normal" during training.
The VAND workshop brings together cutting-edge research tackling this open-set problem across supervised, semi-supervised, and unsupervised methods, as well as few-, one-, and zero-shot approaches.
This year, we're also hosting two exciting challenges: (1) 'Adapt & Detect – Robust anomaly detection in real-world applications', and (2) 'VLM Anomaly Challenge – Few-shot learning for logical and structural anomaly detection using vision-language models'.
Join us to explore the next generation of models that can detect the unexpected.
21th Workshop on Perception Beyond the Visible Spectrum (PBVS'2025) Thu 12 Jun 08:30 a.m.
Workshop: SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: What is Next in Multimodal Foundation Models? Thu 12 Jun 08:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Vision Meets Physics: Synergizing Physical Simulation and Computer Vision Thu 12 Jun 08:45 a.m.
This workshop explores the evolving intersection of computer vision and physics, where two competing perspectives—physics-based simulations versus data-driven approaches like video foundation models—seek to model the world effectively. By bringing together researchers from both fields, the event aims to foster collaboration, identify synergies, and advance applications in scientific research, generative AI, robotics, gaming, and extended realities (XR). Through presentations and discussions, the workshop will promote interdisciplinary dialogue to develop next-generation technologies that combine physics-based and data-driven methods, ultimately enhancing realistic simulations for immersive environments, automated tasks, and seamless virtual-physical integration.
The 3rd Workshop on Sign Language Recognition, Translation and Production Thu 12 Jun 08:45 a.m.
Sign languages are visual languages and a key form of communication for deaf communities. Thanks to recent advances in deep learning and computer vision and the availability of larger datasets, significant progress has been made in sign language technologies. Following the first and second editions, this workshop is motivated by the desire to broaden participation in sign language research from the computer vision community. It aims to bring together researchers working on different aspects of vision-based sign language research and sign language linguists to explore recent advances and future directions in sign language recognition, translation, and production.
Please visit our schedule page for details: https://slrtpworkshop.github.io/schedule/
Second Joint Egocentric Vision (EgoVis) Workshop Thu 12 Jun 08:45 a.m.
Egocentric devices like wearable cameras, smart glasses, and AR/VR headsets are rapidly evolving to automatically recognize user actions, environments, gestures, and social interactions. This workshop serves as a central gathering point for the egocentric vision community to exchange ideas and explore this fast-growing field. It features challenges across five major datasets (EPIC-Kitchens, Ego4D, Ego-Exo4D, HoloAssist, HD-EPIC), keynote talks from leading experts, abstract presentations on emerging ideas, EgoVis award to seminal papers from 2023/2024, and poster sessions on pivotal papers—offering a comprehensive look at the future of egocentric perception and wearable AI.
2nd Workshop on Embodied "Humans": Symbiotic Intelligence between Virtual Humans and Humanoid Robots Thu 12 Jun 08:50 a.m.
This workshop aims to explore the pathway toward building “Embodied Humans”—intelligent humanoid agents capable of both physical action and cognitive reasoning like humans—where the boundary between digital avatars and physical humanoid robots could be dissolved through their co-evolution across virtual and real worlds. We will examine this synergy‘s possibility through three core dimensions: 1) how humanoid robots learning foundational “genes” from avatars? 2) how virtual humans gain physical plausibility from robots‘ embodiment to enrich realism and interactivity? and 3) how both systems develop self-autonomy to perceive, plan, and act in dynamic, open-ended environments? Featuring academic researchers and industry experts as invited speakers and panelists, the workshop brings together perspectives from virtual avatar modeling and humanoid robot learning to explore how systems on both ends are progressing toward human-like capacities for perception, reasoning, and movement. Through advanced techniques—such as reinforcement learning, cognition modeling, motion and structure perception, geometric representations, multimodal simulation, and large language/vision/action models—we aim to understand how virtual humans are evolving beyond surface-level realism, and how humanoid robots are advancing beyond pre-scripted skills—enabling both to engage the world with situational understanding, behavioral adaptability, and autonomous intent. At the heart of this workshop lie two essential questions: What makes a virtual human real—not just to see, but to know? And what does it take for a humanoid robot to not just move, but to become?
Workshop: ScanNet++ Novel View Synthesis and 3D Semantic Understanding Challenge Thu 12 Jun 08:50 a.m.
Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces. The ScanNet++ workshop offers the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.
Workshop: Rhobin2025: The Third Rhobin Challenge on Reconstruction of Human-Object Interaction Thu 12 Jun 08:55 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models Thu 12 Jun 09:00 a.m.
The CVPR community has long focused on evaluating AI systems for their general scene-understanding capabilities. However, as these models are deployed globally, it is essential that they also understand cultural concepts and values, ensuring they cater to the diverse needs of users. This workshop expands computer vision frontiers by bringing together researchers from computer vision, natural language processing, AI ethics, and cultural anthropology to discuss how we can build geo-diverse and culturally aware vision-language models (or AI models in general). Specifically, the workshop will focus on evaluating the types of tasks, benchmarks, and metrics we should develop to advance AI systems' capabilities in this area and explore promising approaches to overcome the challenges. Second, the workshop will benchmark progress in geo-diverse and cultural understanding of vision-language models through the CulturalVQA and GlobalRG challenges, which will test critical abilities such as visual question answering and grounding in culturally diverse scenarios. The insights from this workshop extend beyond computer vision, with significant implications for fields like healthcare, education, and e-commerce, where culturally aligned AI can enhance user experiences. Additionally, the workshop aims to inspire further research in AI ethics, fairness, and responsible AI deployment.
Workshop: 7th Safe Artificial Intelligence for All Domains (SAIAD) Thu 12 Jun 09:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Visual Generative Modeling: What’s After Diffusion? Thu 12 Jun 09:00 a.m.
In recent years, diffusion models have rapidly overtaken previous methods to become the dominant approach in visual generative modeling, with widespread applications in generating images, videos, 3D objects, and more. However, these models also come with notable limitations, such as slow generation speeds, limited human intervention during the generation process, and challenges in modeling complex distributions like long videos.
This year, our Visual Generative Modeling workshop at CVPR aims to explore what lies beyond diffusion models in visual generative modeling. We will discuss novel insights, alternative approaches, and new possibilities in modeling and generating visual data. Join us for a full-day event featuring keynote talks from both academia and industry -- all designed to ignite innovative ideas and novel research in visual generative modeling.
6th Embodied AI Workshop (EAI) Thu 12 Jun 09:00 a.m.
The Sixth Annual Embodied AI Workshop brings together researchers from computer vision, language, graphics and robotics to share the latest advances in embodied intelligent agents that see, talk, listen, reason, and act in bodies within interactive environments. This year's workshop focuses on Real World Applications, with topics including Embodied AI Solutions, Advances in Simulation, Generative Methods, and Foundation Models. The workshop will feature invited talks, a poster session, and panel discussions. Also, the sixth iteration of the workshop continues its tradition of highlighting several embodied AI challenges that advance the state of the art in the field.
Workshop: Test-time Scaling for Computer Vision Thu 12 Jun 09:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Another Brick in the AI Wall: Building Practical Solutions from Theoretical Foundations Thu 12 Jun 09:00 a.m.
The shift towards foundation models has overshadowed the unique insights of deep learning theory, resulting in a loss of valuable knowledge and resources for the community. As machine learning and computer vision extend into new domains, such as biology, a deeper understanding of vision tasks becomes increasingly important. This workshop will provide a crucial platform for discussing the systematic challenges of integrating theory and practice. Concretely, to bridge the gap between theoretical research in machine learning and its practical applications, the workshop aims to explore how theoretical tools can be leveraged to perform rigorous worst-case analysis, crucial for deploying machine learning models in safety-critical societal domains like healthcare, education, and sustainability.
Workshop: AI for Content Creation Thu 12 Jun 09:00 a.m.
AI for content creation plays a crucial role in domains such as photography, videography, virtual reality, gaming, art, design, fashion, and advertising, and lies at the intersection of computer vision, machine learning, computer graphics, and design. This workshop will provide attendees with a slice of cutting-edge techniques within this rapidly evolving field, considering both the fundamental technologies and practical challenges faced by designers and content creators, and will show successful applications of AI and deep learning in content creation. With invited speakers of world-class expertise in content creation, up-and-coming researchers, and posters from authors of submitted workshop papers, the workshop will help all to engage in a day filled with learning, discussion, and network building.
Workshop: Spatial Intelligence for Cultural Heritage Thu 12 Jun 09:00 a.m.
Workshop: Mechanistic Interpretability for Vision Thu 12 Jun 09:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures Thu 12 Jun 09:00 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: LOVE: Multimodal Video Agent Thu 12 Jun 09:00 a.m.
5th Workshop on CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Thu 12 Jun 09:20 a.m.
Many biological organisms have evolved to exhibit diverse behaviors, and understanding these behaviors is a fundamental goal of multiple disciplines including neuroscience, biology, animal husbandry, ecology, and animal conservation. These analyses require objective, repeatable, and scalable measurements of animal behaviors that are not possible with existing methodologies that leverage manual encoding from animal experts and specialists. Recently, computer vision has been making a significant impact across multiple disciplines by providing new tools for the detection, tracking, and analysis of animal behavior. This workshop brings together experts across fields to stimulate this new field of computer-vision-based animal behavioral understanding.
Workshop: Computer Vision for Drug Discovery: Where are we and What is Beyond? Thu 12 Jun 09:30 a.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The workshop aims to bridge the gap between computer vision, artificial intelligence, and the life sciences, with a focus on transformative advancements in drug discovery. By integrating innovative imaging modalities—such as Spatial Transcriptomics, Cell Painting, and Optical Pooled Screening—with state-of-the-art computer vision techniques, this workshop seeks to foster collaboration between experts in biomedical science, AI, and computer vision.x000D
x000D
The workshop highlights the potential for revolutionizing drug discovery processes, driving faster and more accurate identification of therapeutic targets, and expediting the development of treatments for complex diseases. Addressing pressing challenges like cancer, neurodegenerative disorders, and pandemics, the focus lies on leveraging AI to analyze high-dimensional biological data, enhancing our understanding of disease mechanisms and responses to therapies.x000D
x000D
For the CVPR community, this represents an exciting opportunity to expand beyond traditional image processing tasks into applications with tangible societal impact. By applying computer vision expertise to critical healthcare and pharmaceutical challenges, participants will engage with tasks like multi-modal data fusion, enhancing explainability in biomedical applications, and addressing the unique complexities of biological imaging, such as sparse or noisy datasets.x000D
x000D
This workshop is aligned with CVPR’s growing emphasis on “AI for social good,” offering computer vision researchers a platform to contribute to advances in medical science that could improve the lives of millions. It is a call to action for interdisciplinary innovation, uniting diverse expertise to tackle some of the most critical challenges in global health.
Workshop: Agent in Interaction, from Humans to Robots Thu 12 Jun 09:30 a.m.
The Seventh Workshop on Precognition: Seeing through the Future Thu 12 Jun 12:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.
One example application is in autonomous driving technologies, where vision-based understanding of a traffic scene and prediction of movement of traffic actors is a critical piece of the autonomous puzzle. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been the focus of new theoretical studies and practical applications as much as detection and recognition problems.
Through the organization of this workshop, we aim to facilitate further discussion and interest within the research community regarding this nascent topic. This workshop will discuss recent approaches and research trends not only in anticipating human behavior from videos but also precognition in multiple other visual applications, such as medical imaging, healthcare, human face aging prediction, early event prediction, autonomous driving forecasting, etc.
2nd Workshop on Efficient and On-Device Generation (EDGE) Thu 12 Jun 12:30 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Multi-modal Learning for Materials Science Thu 12 Jun 12:30 p.m.
Workshop: Visual Modeling Challenges for 2D-3D Virtual Try-On Thu 12 Jun 12:30 p.m.
Tutorial: Wenjin Wang · Daniel McDuff
Intelligent Healthcare based on Cameras and Wireless Sensors
This tutorial explores contactless health monitoring using cameras and RF sensors. Topics include measuring vital signs from skin or body imagery, emotion recognition, sleep staging, and activity recognition. It covers radar, WiFi, RFID, and acoustic-based RF sensing, highlighting multi-modal techniques that improve monitoring in healthcare, telemedicine, sports, and driver safety.
Bio s:Tutorial: Thomas Pfeil
Power-efficient neural networks using low-precision data types and quantization
As neural networks grow, sustainability and cost become major challenges. This tutorial covers low-precision data types, quantization methods, and hands-on applications. Attendees will gain tools to maintain model performance while optimizing for efficiency on edge and large-scale deployments.
Bio :Tutorial: Tianyu Yang
This tutorial surveys the growing field of multimodal mathematical reasoning, combining CV, NLP, and symbolic logic. It addresses diagram interpretation, symbolic notation, and multi-step logic. Attendees will explore datasets, models, and evaluation, and discuss applications in education and science.
Bio :Workshop on Perception for Industrial Robotics Automation Thu 12 Jun 01:00 p.m.
This workshop addresses the gap between cutting-edge computer vision research and its practical application in industrial robotics, specifically addressing challenges in tasks like reliable, scalable, and cost-effective bin picking. The workshop brings together researchers and practitioners to discuss topics including 3D scene understanding, embodied AI, and robot learning, focusing on developing robust solutions by considering factors like embodiment, camera choice, and data needs. Complementing the workshop, the Perception Challenge for Bin Picking offers a practical platform for participants to tackle real-world 6DoF pose estimation problems using a robot-in-the-loop evaluation, providing a more realistic performance assessment than traditional vision-only metrics. The workshop and challenge together aim to accelerate the adoption of vision-guided robotics and enhance industrial automation efficiency.
11th IEEE International Workshop on Computer Vision in Sports Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Sports is said to be the social glue of society. It allows people to interact irrespective of their social status, age etc. With the rise of the mass media, a significant quantity of resources has been channeled into sports in order to improve understanding, performance, and presentation. For example, areas like performance assessment, which were previously mainly of interest to coaches and sports scientists are now finding applications in broadcast and other media, driven by the increasing use of on-line sports viewing which provides a way of making all sorts of performance statistics available to viewers. Computer vision has recently started to play an important role in sports as seen in for example football where computer vision-based graphics in real-time enhances different aspects of the game. Computer vision algorithms have a huge potential in many aspects of sports ranging from automatic annotation of broadcast footage, through to better understanding of sport injuries, coaching, and enhanced viewing. So far, the use of computer vision in sports has been scattered between different disciplines. The ambition of this workshop is to bring together practitioners and researchers from different disciplines to share ideas and methods on current and future use of computer vision in sports.
8th Workshop and Competition on Affective & Behavior Analysis in-the-wild Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The ABAW Workshop is a premier platform highlighting the latest advancements in multimodal analysis, generation, modeling, and understanding of human affect and behavior in real-world, unconstrained environments. It emphasizes cutting-edge systems that integrate facial expressions, body movements, gestures, natural language, voice and speech to enable impactful research and practical applications. The workshop fosters interdisciplinary collaboration across fields such as computer vision, AI, human machine interaction, psychology, robotics, ethics & healthcare. The workshop further addresses complex challenges like algorithmic fairness, demographic bias & data privacy, making it a vital forum for building equitable, generalizable & human-centered AI systems. By uniting experts from academia, industry & government, the workshop promotes innovation, drives knowledge exchange, and inspires new directions in affective computing, behavior modelling and understanding & human-computer interaction. Finally, the Workshop includes a Competition with 6 challenges, including valence-arousal estimation, basic & compound expression recognition, action unit detection, emotional mimicry intensity estimation and ambivalence/hesitancy recognition.
Tutorial: Jason Clemons, Hongxu (Danny) Yin, and Xinglong Sun
Full-Stack, GPU-based Acceleration of Deep Learning and Foundation Models
This tutorial offers insights across the hardware-software stack to accelerate deep neural networks, from convolutions to multimodal LLMs. Attendees will learn practical tools and trade-offs to optimize performance and inspire the next generation of scalable acceleration techniques.
Bio :Workshop: Domain Generalization: Evolution, Breakthroughs, and Future Horizons Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
The 6th International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture in conjunction with IEEE CVPR 2025 Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
With the recent success of computer vision and deep learning in various applications, there has been significantly increasing attention towards its use in agriculture, presenting both significant economic and social opportunities. This The 6th Annual International Workshop And Prize Challenge on Agriculture-Vision aims to foster research and applications at the intersection of computer vision and agriculture, addressing challenges in real-world agricultural scenarios, with a strong record from prior editions at CVPR 2020-2024. The workshop will feature a computer vision challenge, and invited speakers from diverse academic and industry backgrounds including computer vision, robotics, agriculture, and top industry practitioners. This event provides a platform to showcase current progress in interdisciplinary areas and encourage further research and development of Foundation Models In Agriculture.
Workshop: Open-World 3D Scene Understanding with Foundation Models Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: VizWiz Grand Challenge Thu 12 Jun 01:00 p.m.
First Workshop on Experimental Model Auditing via Controllable Synthesis (EMACS) Thu 12 Jun 01:00 p.m.
With the increasing adoption of machine learning models in high-stakes applications, rigorous audits of model behavior have assumed paramount importance. However, traditional auditing methods fall short of being truly experimental, as they rely on wild-caught observational data that has been manually labeled. Enter generative techniques, which have recently shown impressive capabilities in automatically generating and labeling high-quality synthetic data at scale. Critically, many such methods allow for the isolation and manipulation of specific attributes of interest, paving the path towards robust experimental analysis. x000D
x000D
x000D
This workshop is dedicated to exploring techniques for auditing the behavior of machine learning models – including (but not limited) to performance, bias, and failure modes – by the controlled synthesis (via generation or simulation) of data. Of special interest are algorithms for generating data (images, text, audio, etc.) and benchmarking that provide reliable insights into model behavior by minimizing the impact of potential confounders. We also welcome work on the broader topic of using synthetic or quasi-synthetic data for model debugging, broadly construed, with the goal of providing a venue for interdisciplinary exchange of ideas on this emerging topic.
Tutorial: Zhengyuan Yang
Recent Advances in Vision Foundation Models
This tutorial covers cutting-edge developments in vision foundation models. Topics include multimodal understanding and generation, scaling test-time compute, and applications for physical and virtual agents. The session will provide insights into the design and future directions of vision-based foundation models.
Bio :The first Workshop on Enforcing Geometric, Physical, Topological, and Functional Inductive Bias in 3D Generation Thu 12 Jun 01:00 p.m.
Tutorial: Constantin Seibold
This tutorial explores techniques for dataset curation, quality monitoring, dimensionality reduction (t-SNE, UMAP, h-NNE), and clustering (k-means, DBSCAN, FINCH). Attendees will learn how to use these methods to understand structure, reduce bias, detect outliers, and improve performance in AI and CV workflows.
Bio :Workshop: AI for Creative Visual Content Generation, Editing and Understanding Thu 12 Jun 01:00 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Visual content creation is booming, yet producing engaging visual content remains a challenging task. This workshop aims to highlight machine learning technologies that accelerate and enhance creative processes in visual content creation and editing, including image animation, text-to-visual content generation, and content translation. Moreover, we believe that advancing technology to better understand edited visual content can enable novel platforms for creating compelling media. We seek to bridge the gap between technical and creative communities by bringing together researchers from computer vision, graphics, and the arts, fostering interdisciplinary collaboration and exploring opportunities in this under-explored area.
The 4th Workshop on Transformers for Vision Thu 12 Jun 01:20 p.m.
Catch UAVs that Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle (UAV) in the Wild and the 4th Anti-UAV Workshop & Challenge Thu 12 Jun 01:30 p.m.
Workshop on 3D Human Understanding Thu 12 Jun 01:30 p.m.
ReGenAI: Second Workshop on Responsible Generative AI Thu 12 Jun 01:30 p.m.
Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)
Workshop: Physics-inspired 3D Vision and Imaging Thu 12 Jun 01:45 p.m.
3D computer vision has become fundamental to technologies ranging from medical imaging to astronomy and from AR/VR to embodied intelligence. New sensors and imaging modalities like structured-light, time-of-flight, and light field microscopy are being developed to make 3D vision more tractable; but even with new types of sensor data, many problems in 3D vision tend to be ill-posed and hence to solve them we often rely on heuristics or data-driven priors. Unfortunately, these priors can fail in certain cases, especially for problems where ground truth data is not available, or for niche sensors where capturing large datasets is not feasible. A promising, but often overlooked, alternative is to incorporate knowledge of physics (e.g. physical light transport) into 3D computer vision algorithms, which can better constrain the solutions that they produce.
The goal of this workshop is to highlight work in 3D computer vision and imaging that makes use of physics-inspired modeling and physical-priors, showcasing their importance even with the prevalence of neural priors and big data. Examples include methods that apply physics-based approaches to inverse rendering, 3D microscopy, tomography, and light-in-flight imaging; or methods that combine such approaches with novel tools like neural radiance fields (NeRFs), 3D Gaussian Splatting (3DGS), and generative image/video models.