CVPR 2025 Thursday 06/12

Skip to yearly menu bar Skip to main content

Timezone: America/Chicago

Full Schedule Wed 6/11 Thu 6/12 Fri 6/13 Sat 6/14 Sun 6/15

Registration Desk

Registration / Badge Pickup

7:00 AM - 5:00 PM

Workshop

1st International Workshop on Interactive Video Search and Exploration (IViSE)

Luca Rossetto · George Awad · Werner Bailer · Cathal Gurrin · Björn Jónsson · Jakub Lokoč · Stevan Rudinac · Klaus Schoeffmann

8:00 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Workshop on Foundation and Large Vision Models in Remote Sensing

Saurabh Prasad · Jocelyn Chanussot · Begüm Demir · Biplab Banerjee · Danfeng Hong

8:00 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

This workshop will feature keynotes and presentations at the cutting-edge of foundation models and large vision models for remote sensing. It will bring together researchers working on both foundation and large vision models and geospatial image analysis to address the nuances presented by using such emergent models for remotely sensed imagery (e.g. a multitude of sensors with different sensing characteristics/specifications, diverse imaging modalities, ranging from passive-optical multi/hyperspectral to active-imaging such as SAR and LiDAR; limited ground-reference data etc.). Our emphasis will range from large vision and foundation models that are showing promise in the computer vision community to foundation models that are pre-trained on large-quantities of earth-observation imagery. This workshop will provide a venue for the community to present works that push the envelope on adapting these models for effective inference of multi-sensor, multi-temporal, multi-scale earth observation imagery.

Tutorial

Geospatial Computer Vision and Artificial Intelligence for Large-Scale Earth Observation Data

Orhun Aydin

8:00 AM - 5:00 PM

Earth observation (EO) data has applications in agriculture, disaster management, and security. This tutorial explores integrating CV and EO data using diverse sensing types. Attendees will learn about open-source tools, multimodal reasoning, geospatial foundation models, and hands-on analysis of EO data for environmental and climate monitoring.

Workshop

Workshop on Computer Vision for Microscopy Image Analysis

Mei Chen · Dimitris N. Metaxas · Steve Finkbeiner · Oren Kraus

8:00 AM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Tutorial

3D Shape Analysis: From Classical Optimization to Learning-based Matching

Viktoria Ehm

8:00 AM - 5:00 PM

3D shape analysis deals with extracting information from geometric data, with applications in driving, biomedicine, and AR/VR. This tutorial covers classical shape matching methods (linear and quadratic assignment problems), product graph formalisms, learning-based correspondence, spectral methods, and real-world applications. Challenges and future directions are also addressed.

Tutorial

Robotics 101: An Odyssey from A Vision Perspective

Chonghao Sima

8:00 AM - 5:00 PM

This full-day tutorial offers a vision-focused introduction to robotics. It covers foundational background, technical advancements, key challenges, and emerging directions. With diverse speakers from multiple domains, the tutorial is divided into two sessions: 'Perceive the World' and 'Interact with the World', addressing perception and interaction in robotics.

Workshop

Workshop on Distillation of Foundation Models for Autonomous Driving

Burhan Yaman · Yunsheng Ma · Xin Ye · Xu Cao · Wenqian Ye · Ana Jojic · Abhirup Mallik · Ziran Wang

8:00 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

The 1st Workshop on Distillation of Foundation Models for Autonomous Driving (WDFM-AD) aims to advance the deployment of large foundation models—such as vision-language models (VLMs) and generative AI (GenAI) models—within autonomous driving systems through efficient distillation techniques. Building on the momentum of prior workshops focused on large language and vision models for autonomous driving, WDFM-AD provides a dedicated platform for researchers and industry practitioners to explore methods that bridge cutting-edge foundation model research with real-world deployment, particularly under the stringent latency and resource constraints of autonomous vehicles. By addressing the challenges of compressing, aligning, and deploying foundation models for self-driving, WDFM-AD seeks to accelerate their safe, efficient, and scalable integration into next-generation autonomous driving systems.

Tutorial

Edge AI in Action: Technologies and Applications

Fabricio Narcizo

8:00 AM - 12:00 PM

Edge AI in Action is a hands-on tutorial exploring practical tools to develop and deploy AI models on resource-constrained devices. Topics include model optimization, deployment of LLMs and CV models, and integration with cloud-edge architectures. Demonstrations include devices like Raspberry Pi, iPhones, and Androids. Attendees will gain actionable insights into real-world Edge AI.

Workshop

Workshop on Visual Concepts

Yunzhi Zhang · Joy Hsu · Jiayuan Mao · R. Kenny Jones · Himanshu Singh Singh · Daniel Cohen-Or · Shangzhe Wu · Jiajun Wu

8:00 AM - 5:00 PM

Visual concept discovery aims to extract compact and structured representations of the visual world, and recompose them to tackle novel intricate problems. It has played a crucial role in many core problems in computer vision research, including both discriminative and generative tasks. An important research question is to understand and design concept representations that facilitate better learning from various datasets and compositional reasoning. As an endeavor to answering this question, in this workshop, we gather together researchers in computer vision, multi-modal learning, machine learning, and cognitive science to discuss the development and interpretation of visual concept learning systems and their applications.

Tutorial

Continuous Data Cycle via Foundation Models

Nadine Chang

8:00 AM - 12:00 PM

Foundation models are being continuously integrated into applications like autonomous driving and diagnostics. This tutorial explores the data-model feedback loop: how foundation models affect data curation and vice versa. Talks cover leveraging foundation models to build efficient data engines, enhancing model performance, and addressing data relevance, scale, and quality.

Tutorial

Sense, Perceive, Interact & Render on Android XR

Sean Fanello

8:00 AM - 5:00 PM

This tutorial details the perception stack built for Android XR, including head, hand, face, and eye tracking. It covers data capture, rendering, photorealistic avatars, and scene understanding. Use cases highlight the stack's application in immersive and interactive experiences.

Workshop

Three things everyone should ask about photorealistic virtual try-on.

Aayush Bansal · Minh Vo

8:00 AM - 12:30 PM

Virtual Try-On (VTON) promises to transform the apparel e-commerce industry, offering benefits for shoppers, businesses, and the environment. This workshop will address three key challenges that must be overcome to realize VTON's full potential: achieving high-fidelity, rapid video try-ons; accurately predicting 3D garment size and improving 3D human body reasoning; and defining robust metrics for synthesis quality that avoid offensive results across diverse demographics. Addressing these VTON-specific challenges will necessitate fundamental advancements in generative image and video synthesis, offering broader impact within the computer vision and machine learning communities.

Workshop

Workshop on 3D-LLM/VLA: Bridging Language, Vision and Action in 3D Environments

Jianing "Jed" Yang · Shengyi Qian · Yining Hong · Valts Blukis · Xiaojian Ma · Yash Bhalgat · Iro Laina · Joyce Chai · David Fouhey

8:00 AM - 12:45 PM

This workshop addresses a critical gap in current AI research by focusing on the integration of language and 3D perception, which is essential for developing embodied agents and robots, especially considering the recent rise of multimodal LLMs and vision-language-action (VLA) models.

The workshop will explore challenges and opportunities in this area, providing a platform for researchers to share their work, discuss future directions, and foster collaboration across disciplines including robotics, computer vision, natural language processing, and human-computer interaction.

Tutorial

Efficient Text-to-Image/Video modeling

Srikumar Ramalingam

8:00 AM - 12:00 PM

We are witnessing groundbreaking results in image-to-text and image-to-video models. However, the generation process with these models is iterative and computationally expensive. There is a growing need to make these algorithms faster for serving millions of users efficiently. This course focuses on techniques such as progressive parallel decoding, distillation methods, and Markov Random Fields to accelerate text-to-image and text-to-video models. The course also critiques popular evaluation techniques like FID and introduces efficient alternatives such as CMMD.

Workshop

WorldModelBench: The First Workshop on Benchmarking World Foundation Models

Heng Wang · Prithvijit Chattopadhyay · Ming-Yu Liu · Mike Zheng Shou · Jay Zhangjie Wu · Xihui Liu · Deepti Ghadiyaram · Gowthami Somepalli · Huaxiu Yao · Wenhu Chen · Jiaming Song · Humphrey Shi

8:00 AM - 12:00 PM

World models are predictive systems that enable Physical AI agents to understand, decide, plan, and analyze counterfactuals through integrated perception, instruction processing, controllability, physical plausibility, and future prediction capabilities. The past year has witnessed significant advancements from both academic and industrial research teams, with various models utilizing different conditioning approaches (text, image, video, control) being released openly and commercially. While these developments enable applications in content creation, autonomous driving, and robotics, the models' diversity in training methods, data sources, architecture, and input processing necessitates critical evaluation. The WorldModelBench workshop aims to address this need by fostering discussions on evaluation criteria (physical correctness, prompt alignment, generalizability), metrics development, standardized methodologies, and crucial topics including accessible benchmarking, quantitative evaluation protocols, downstream task assessment, and safety/bias considerations in world models.

Tutorial

Multi-Modal Computer Vision and Foundation Models In Agriculture in conjunction with IEEE CVPR 2025

Chris Padwick

8:00 AM - 12:00 PM

With the recent success of computer vision and deep learning in various applications, there has been significantly increasing attention paid to its use in agriculture. Agriculture-related vision problems are of great economic and social value. For example, robotics has recently been reinvigorated with work on Vision-Language-Action models. Building on these successes, researchers are using multi-modal computer vision foundation models to make progress on agricultural tasks and topics. Some relevant examples include: 1) Agricultural models that leverage data from different remote sensing platforms; 2) Multi-temporal yield prediction models using unsupervised domain adaptation; 3) Multi-modal models for identifying pests and weeds. This tutorial will encourage research in ML, CV, and agriculture, featuring leading researchers discussing the evolution and trends in this field.

Tutorial

Animal re-identification

Lukas Picek

8:00 AM - 12:00 PM

This tutorial introduces the field of individual animal re-identification (ReID), crucial for ecological monitoring, conservation, and ethical wildlife research. Accurate animal ReID supports long-term monitoring of endangered species, combatting poaching, and understanding animal behavior. This half-day hybrid tutorial includes multiple talks and a panel discussion to encourage interaction and research directions.

Tutorial

Computer Vision over Homomorphically Encrypted Data

Vishnu Naresh Boddeti

8:00 AM - 12:00 PM

Over the past decade, computer vision (CV) systems have become integral to healthcare, surveillance, and personal devices. The sensitive nature of data and models raises privacy concerns. Fully homomorphic encryption (FHE) allows computations on encrypted data, ensuring privacy. This tutorial explores integrating FHE into CV, addressing its challenges, mathematical foundations, key FHE schemes, SIMD capabilities, and hands-on demonstrations. It covers private and encrypted CV tasks and discusses open research directions.

Workshop

C3DV: 3rd Workshop on Compositional 3D Vision

Habib Slim · Xiang Li · Mahmoud Ahmed · Peter Vajda · Wolfgang Heidrich · Mohamed Elhoseiny · Natalia Neverova

8:00 AM - 12:00 PM

Workshop

5th International Workshop on Event-based Vision

Guillermo Gallego · Kostas Daniilidis · Cornelia Fermuller · Davide Migliore · Daniele Perrone

8:00 AM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

The Event-based Vision Workshop at CVPR is the premier venue for discussing exciting new ideas about neuromorphic cameras and their processing methods. It covers the sensing hardware, as well as the processing, data, and learning methods needed to take advantage of event-based cameras. The workshop aims to highlight an emerging field with the potential to overcome many of the limitations of frame-based systems (speed, power consumption, robustness to HDR illumination, etc.). This forum fosters community building around these novel cameras, capitalizing on a growing interest and increasing contributions at the main conference. Furthermore, the workshop seeks to connect with a broader audience by highlighting interdisciplinary links between computer vision, robotics, artificial intelligence, computational neuroscience, and psychology, as event cameras facilitate research into replicating the efficiency and robustness of the human visual system.

Workshop

Efficient Large Vision Models

Amirhossein Habibian · Fatih Porikli · Auke Wiggers · Yung-Hsiang Lu · Vincent Tao Hu · Lanqing Guo · Qinghao Hu

8:00 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

This workshop focuses on the core principles of efficiency in large-scale vision models. How do we minimize redundant operations in generative models without compromising quality? Can autoregressive decoding and diffusion sampling be accelerated through parallelization? What are the trade-offs between compression, quantization, and expressivity? We seek to advance new directions in compact model representations, adaptive computation, parallel decoding, and structured sparsity—approaches that go beyond incremental optimizations and redefine how LVMs operate.

We invite researchers working on fast and scalable vision architectures, low-cost inference, and efficient generative models to share their insights. Whether through sampling acceleration, efficient transformers, new architectural paradigms, or theoretical limits of model compression, this workshop provides a platform to discuss how LVMs can be optimized for both performance and practicality.

Join us in shaping the next generation of vision models—where efficiency is not just a constraint, but a driving force for innovation.

Workshop

DriveX - Foundation Models for V2X-Based Cooperative Autonomous Driving

Walter Zimmer · Ross Greer · Max Ronecker · Chuheng Wei · Haibao Yu · Rui Song · Xingcheng Zhou · Holger Caesar · Julie Stephany Berrio Perez · Alina Roitberg · Daniel Watzenig · Mohan Trivedi · Alois Knoll

8:00 AM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

11th Workshop on Medical Computer Vision

Yuankai Huo · Le Lu · Bennett Landman · Daniel Moyer · Jie Wu · Xiaoxiao Li · Chenyu You · Zhiyu Wan · Yucheng Tang · Nourhan Bayasi · Roza Bayrak

8:15 AM - 5:30 PM

Workshop

21th Workshop on Perception Beyond the Visible Spectrum (PBVS'2025)

Riad I. Hammoud

8:30 AM - 5:10 PM

Workshop

3D Digital Twin: Progress, Challenges, and Future Directions

Zhao Dong · Zhaoyang Lv · Zhengqin Li · Jiajun Wu · Hao Su · Manmohan Chandraker · Kalyan Sunkavalli · Jia Deng · Shuang Zhao · Lingjie Liu · Jerome Revaud · Hong-Xing Yu · Yunzhi Zhang · Leonidas Guibas

8:30 AM - 5:15 PM

Despite the growing momentum around 3D reconstruction and generative AI in computer vision, a critical gap remains: how to create photorealistic, fully functional 3D digital twins that are indistinguishable from their real-world counterparts and enable practical applications. This workshop tackles that challenge by spotlighting 3D digital twin creation technologies and their broad impact across AR/VR, spatial and contextual AI, and robotics. Distinguished speakers from diverse disciplines will share cutting-edge digital twin creation techniques and real-world use cases. Additionally, we are excited to launch a benchmark and challenge for 3D digital twin creation, built on our Digital Twin Catalog (DTC) dataset and supported by open-source baselines. This initiative aims to spark meaningful discussion, foster collaboration, and accelerate progress in both academic research and practical deployment.

Workshop

The 5th Workshop of Adversarial Machine Learning on Computer Vision: Foundation Models + X

Tianyuan Zhang · Siyang Wu · Aishan Liu · Jiakai Wang · Siyuan Liang · Felix Juefei-Xu · Qing Guo · Xinyun Chen · Yew-Soon Ong · Xianglong Liu · Dawn Song · Alan L. Yuille · Philip H.S. Torr · Dacheng Tao

8:30 AM - 12:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

PixFoundation: Workshop on Pixel-level Vision Foundation Models

Mennatullah Siam · Stella X. Yu · Sangwoo Mo · Leonid Sigal · Raoul de Charette · Tanzila Rahman · He Zhao · Aoran Xiao

8:30 AM - 5:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

In recent years, foundation models have gained significant traction and success. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. There is growing interest and progress specifically in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, Llava and Cambrian. Various pixel-level vision foundation models have also emerged for image/video referring segmentation and depth estimation. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image/video segmentation, referring image/video segmentation and reasoning, tracking, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features seven invited talks, mixing emerging and established researchers, along with posters and selective spotlight presentations.

Workshop

Women in Computer Vision

Estefanía Talavera · Deblina Bhattacharjee · Mengwei Ren · Himangi Mittal · Karen Sanchez

8:30 AM - 1:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

What is Next in Multimodal Foundation Models?

Hilde Kuehne · Rogerio Feris · Leonid Karlinsky · Anna Kukleva · Ameya Prabhu · Wei Lin · Muhammad Jehanzeb Mirza · Sivan Doveh · Roei Herzig

8:30 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

VAND: Visual Anomaly and Novelty Detection - 3rd Edition

Latha Pemula · Samet Akcay · Toby P. Breckon · Philipp Seeböck · Paul Bergmann · Paula Ramos-Giraldo · Yedid Hoshen · Guansong Pang · Jawad Tayyub · Thomas Brox

8:30 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Anomaly detection—also known as novelty or out-of-distribution detection—is a key challenge in computer vision and pattern recognition. From medical imaging to industrial inspection, spotting what doesn’t belong is critical, yet notoriously hard. Why? Because anomalies can take unlimited forms, and most models see only a narrow slice of the possible "normal" during training.
The VAND workshop brings together cutting-edge research tackling this open-set problem across supervised, semi-supervised, and unsupervised methods, as well as few-, one-, and zero-shot approaches.
This year, we're also hosting two exciting challenges: (1) 'Adapt & Detect – Robust anomaly detection in real-world applications', and (2) 'VLM Anomaly Challenge – Few-shot learning for logical and structural anomaly detection using vision-language models'.
Join us to explore the next generation of models that can detect the unexpected.

Workshop

SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets

Khoi Nguyen · Anh Tran · Binh-Son Hua · Supasorn Suwajanakorn · Yi Zhou

8:30 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

The 3rd Workshop on Sign Language Recognition, Translation and Production

Mohamed Lakhal · Ozge Mercanoglu Sincan · Edward Fish · Harry Walsh · Gul Varol · Liliane Momeni · Necati Cihan Camgoz · Neil Fox · Kearsy Cormier · Bencie Woll · Richard Bowden

8:45 AM - 12:30 PM

Sign languages are visual languages and a key form of communication for deaf communities. Thanks to recent advances in deep learning and computer vision and the availability of larger datasets, significant progress has been made in sign language technologies. Following the first and second editions, this workshop is motivated by the desire to broaden participation in sign language research from the computer vision community. It aims to bring together researchers working on different aspects of vision-based sign language research and sign language linguists to explore recent advances and future directions in sign language recognition, translation, and production.

Please visit our schedule page for details: https://slrtpworkshop.github.io/schedule/

Workshop

Second Joint Egocentric Vision (EgoVis) Workshop

Siddhant Bansal · Antonino Furnari · Tushar Nagarajan · Dima Damen · Giovanni Maria Farinella · Kristen Grauman · Jitendra Malik · Richard Newcombe · Marc Pollefeys · Yoichi Sato · David Crandall

8:45 AM - 5:30 PM

Egocentric devices like wearable cameras, smart glasses, and AR/VR headsets are rapidly evolving to automatically recognize user actions, environments, gestures, and social interactions. This workshop serves as a central gathering point for the egocentric vision community to exchange ideas and explore this fast-growing field. It features challenges across five major datasets (EPIC-Kitchens, Ego4D, Ego-Exo4D, HoloAssist, HD-EPIC), keynote talks from leading experts, abstract presentations on emerging ideas, EgoVis award to seminal papers from 2023/2024, and poster sessions on pivotal papers—offering a comprehensive look at the future of egocentric perception and wearable AI.

Workshop

Vision Meets Physics: Synergizing Physical Simulation and Computer Vision

Fangyin Wei · Donglai Xiang · Qianli Ma · Yifei Li · Ming Lin · Chenfanfu Jiang · Shenlong Wang · David I.W. · Tsung-Yi Lin

8:45 AM - 5:30 PM

This workshop explores the evolving intersection of computer vision and physics, where two competing perspectives—physics-based simulations versus data-driven approaches like video foundation models—seek to model the world effectively. By bringing together researchers from both fields, the event aims to foster collaboration, identify synergies, and advance applications in scientific research, generative AI, robotics, gaming, and extended realities (XR). Through presentations and discussions, the workshop will promote interdisciplinary dialogue to develop next-generation technologies that combine physics-based and data-driven methods, ultimately enhancing realistic simulations for immersive environments, automated tasks, and seamless virtual-physical integration.

Workshop

ScanNet++ Novel View Synthesis and 3D Semantic Understanding Challenge

Angela Dai · Yueh-Cheng Liu · Chandan Yeshwanth · Ben Mildenhall · Peter Kontschieder · Matthias Nießner

8:50 AM - 12:30 PM

Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces. The ScanNet++ workshop offers the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.

Workshop

2nd Workshop on Embodied "Humans": Symbiotic Intelligence between Virtual Humans and Humanoid Robots

Kwan-Yee Lin · Wayne Wu · Bolei Zhou · Matthias Nießner · Stella X. Yu

8:50 AM - 5:00 PM

This workshop aims to explore the pathway toward building “Embodied Humans”—intelligent humanoid agents capable of both physical action and cognitive reasoning like humans—where the boundary between digital avatars and physical humanoid robots could be dissolved through their co-evolution across virtual and real worlds. We will examine this synergy‘s possibility through three core dimensions: 1) how humanoid robots learning foundational “genes” from avatars? 2) how virtual humans gain physical plausibility from robots‘ embodiment to enrich realism and interactivity? and 3) how both systems develop self-autonomy to perceive, plan, and act in dynamic, open-ended environments? Featuring academic researchers and industry experts as invited speakers and panelists, the workshop brings together perspectives from virtual avatar modeling and humanoid robot learning to explore how systems on both ends are progressing toward human-like capacities for perception, reasoning, and movement. Through advanced techniques—such as reinforcement learning, cognition modeling, motion and structure perception, geometric representations, multimodal simulation, and large language/vision/action models—we aim to understand how virtual humans are evolving beyond surface-level realism, and how humanoid robots are advancing beyond pre-scripted skills—enabling both to engage the world with situational understanding, behavioral adaptability, and autonomous intent. At the heart of this workshop lie two essential questions: What makes a virtual human real—not just to see, but to know? And what does it take for a humanoid robot to not just move, but to become?

Workshop

Rhobin2025: The Third Rhobin Challenge on Reconstruction of Human-Object Interaction

Xi Wang · Xianghui Xie · Nikos Athanasiou · Dimitrios Tzionas · Shashank Tripathi · Bharat Lal Bhatnagar · Alexey Gavryushin · Thiemo Alldieck · Muhammed Kocabas · Luc Van Gool · Marc Pollefeys · Gerard Pons-Moll

8:55 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Mechanistic Interpretability for Vision

Tamar Rott Shaham · Yossi Gandelsman · Joanna Materzynska · Rohit Gandikota · Amil Dravid · Ashkan Khakzar · Eli Shechtman · Philip H.S. Torr

9:00 AM - 5:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

AI for Content Creation

James Tompkin · Deqing Sun · Lu Jiang · Lingjie Liu · Fitsum Reda · Jun-Yan Zhu · Krishna Kumar Singh

9:00 AM - 6:00 PM

AI for content creation plays a crucial role in domains such as photography, videography, virtual reality, gaming, art, design, fashion, and advertising, and lies at the intersection of computer vision, machine learning, computer graphics, and design. This workshop will provide attendees with a slice of cutting-edge techniques within this rapidly evolving field, considering both the fundamental technologies and practical challenges faced by designers and content creators, and will show successful applications of AI and deep learning in content creation. With invited speakers of world-class expertise in content creation, up-and-coming researchers, and posters from authors of submitted workshop papers, the workshop will help all to engage in a day filled with learning, discussion, and network building.

Workshop

Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures

Haibao Yu · Jianing Qiu · Yao Mu · Jiankai Sun · Li Chen · Walter Zimmer · Jiaru Zhong · Dandan Zhang · Fei Gao · Shanghang Zhang · Mac Schwager · Ping Luo · Zaiqing Nie · Tianxing Chen · Wenxian Yang · Ruiyang Hao · Chuanye Wang · Jiahao Wang · Siqi Fan

9:00 AM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Another Brick in the AI Wall: Building Practical Solutions from Theoretical Foundations

Grigorios G. Chrysos · Aggelina Chatziagapi · Blerina Gkotse · Vikas Singh · Sanmi Koyejo · Philip H.S. Torr · Matthew B. Blaschko

9:00 AM - 12:30 PM

The shift towards foundation models has overshadowed the unique insights of deep learning theory, resulting in a loss of valuable knowledge and resources for the community. As machine learning and computer vision extend into new domains, such as biology, a deeper understanding of vision tasks becomes increasingly important. This workshop will provide a crucial platform for discussing the systematic challenges of integrating theory and practice. Concretely, to bridge the gap between theoretical research in machine learning and its practical applications, the workshop aims to explore how theoretical tools can be leveraged to perform rigorous worst-case analysis, crucial for deploying machine learning models in safety-critical societal domains like healthcare, education, and sustainability.

Workshop

Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models

Perampalli Shravan Nayak · Mehar Bhatia · Qian Yang · Kanishk Jain · Rabiul Awal · David Adelani · Spandana Gella · Siva Reddy · Vered Shwartz · Yash Goyal · Sjoerd Steenkiste · Karolina Stanczak · Aishwarya Agrawal

9:00 AM - 6:00 PM

The CVPR community has long focused on evaluating AI systems for their general scene-understanding capabilities. However, as these models are deployed globally, it is essential that they also understand cultural concepts and values, ensuring they cater to the diverse needs of users. This workshop expands computer vision frontiers by bringing together researchers from computer vision, natural language processing, AI ethics, and cultural anthropology to discuss how we can build geo-diverse and culturally aware vision-language models (or AI models in general). Specifically, the workshop will focus on evaluating the types of tasks, benchmarks, and metrics we should develop to advance AI systems' capabilities in this area and explore promising approaches to overcome the challenges. Second, the workshop will benchmark progress in geo-diverse and cultural understanding of vision-language models through the CulturalVQA and GlobalRG challenges, which will test critical abilities such as visual question answering and grounding in culturally diverse scenarios. The insights from this workshop extend beyond computer vision, with significant implications for fields like healthcare, education, and e-commerce, where culturally aligned AI can enhance user experiences. Additionally, the workshop aims to inspire further research in AI ethics, fairness, and responsible AI deployment.

Workshop

6th Embodied AI Workshop (EAI)

Claudia DArpino · Anthony G Francis · Cem Gokmen · Changan Chen · Chengshu Li · Angel Xuan Chang · David Hall · German Ros · Joel Jang · Lamberto Ballan · Luca Weihs · Mike Roberts · Minyoung Hwang · Oleksandr Maksymets · Rachith Prakash · Ram Ramrakhya

9:00 AM - 6:00 PM

The Sixth Annual Embodied AI Workshop brings together researchers from computer vision, language, graphics and robotics to share the latest advances in embodied intelligent agents that see, talk, listen, reason, and act in bodies within interactive environments. This year's workshop focuses on Real World Applications, with topics including Embodied AI Solutions, Advances in Simulation, Generative Methods, and Foundation Models. The workshop will feature invited talks, a poster session, and panel discussions. Also, the sixth iteration of the workshop continues its tradition of highlighting several embodied AI challenges that advance the state of the art in the field.

Workshop

7th Safe Artificial Intelligence for All Domains (SAIAD)

Timo Sämann · Oliver Wasenmüller · Markus Enzweiler · Peter Schlicht · Stefan Milz · Thomas Stauner · Joachim Sicking · Claus Bahlmann

9:00 AM - 5:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Visual Generative Modeling: What’s After Diffusion?

Tianhong Li · Yilun Xu · Tianyuan Zhang · Tim Dockhorn · Shuang Li · Arash Vahdat · Kaiming He

9:00 AM - 5:30 PM

In recent years, diffusion models have rapidly overtaken previous methods to become the dominant approach in visual generative modeling, with widespread applications in generating images, videos, 3D objects, and more. However, these models also come with notable limitations, such as slow generation speeds, limited human intervention during the generation process, and challenges in modeling complex distributions like long videos.

This year, our Visual Generative Modeling workshop at CVPR aims to explore what lies beyond diffusion models in visual generative modeling. We will discuss novel insights, alternative approaches, and new possibilities in modeling and generating visual data. Join us for a full-day event featuring keynote talks from both academia and industry -- all designed to ignite innovative ideas and novel research in visual generative modeling.

Workshop

Spatial Intelligence for Cultural Heritage

Marina Paolanti · Roberto Pierdicca · fabio remondino · Livio De Luca

9:00 AM - 1:00 PM

Workshop

Test-time Scaling for Computer Vision

Hang Su · Yinpeng Dong · Jindong Gu · Yichi Zhang · Cihang Xie · Lingjuan Lyu · Jun Zhu · Philip H.S. Torr · Shiguang Shan · Wanli Ouyang

9:00 AM - 12:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

LOVE: Multimodal Video Agent

Mike Zheng Shou · Yiqi Lin · Joya Chen · Ziyun Zeng · Linchao Zhu · Gedas Bertasius · Md Mohaiminul Islam · Gaoang Wang · Wei Li · Matt Feiszli · Lorenzo Torresani · Kristen Grauman · Jitendra Malik

9:00 AM - 12:00 PM

Workshop

5th Workshop on CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling

Leandra Brickson · Urs Waldmann · Shangzhe Wu · Anna Zamansky · Gengshan Yang · Bastian Wandt · Garvita Allabadi · George Martvel · Andre Telfer · Xiaoxuan Ma

9:20 AM - 5:50 PM

Many biological organisms have evolved to exhibit diverse behaviors, and understanding these behaviors is a fundamental goal of multiple disciplines including neuroscience, biology, animal husbandry, ecology, and animal conservation. These analyses require objective, repeatable, and scalable measurements of animal behaviors that are not possible with existing methodologies that leverage manual encoding from animal experts and specialists. Recently, computer vision has been making a significant impact across multiple disciplines by providing new tools for the detection, tracking, and analysis of animal behavior. This workshop brings together experts across fields to stimulate this new field of computer-vision-based animal behavioral understanding.

Workshop

Agent in Interaction, from Humans to Robots

Yufei Ye · Homanga Bharadhwaj · Dandan Shan · Wei-Chiu Ma · Shubham Tulsiani · Abhinav Gupta · Michael J. Black

9:30 AM - 5:30 PM

Workshop

Computer Vision for Drug Discovery: Where are we and What is Beyond?

Dawid Rymarczyk · Ilknur Icke · Adriana Borowa · Ana Sanchez-Fernandez · Chao-hui Huang · Anne Carpenter

9:30 AM - 5:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

The workshop aims to bridge the gap between computer vision, artificial intelligence, and the life sciences, with a focus on transformative advancements in drug discovery. By integrating innovative imaging modalities—such as Spatial Transcriptomics, Cell Painting, and Optical Pooled Screening—with state-of-the-art computer vision techniques, this workshop seeks to foster collaboration between experts in biomedical science, AI, and computer vision.x000D
x000D
The workshop highlights the potential for revolutionizing drug discovery processes, driving faster and more accurate identification of therapeutic targets, and expediting the development of treatments for complex diseases. Addressing pressing challenges like cancer, neurodegenerative disorders, and pandemics, the focus lies on leveraging AI to analyze high-dimensional biological data, enhancing our understanding of disease mechanisms and responses to therapies.x000D
x000D
For the CVPR community, this represents an exciting opportunity to expand beyond traditional image processing tasks into applications with tangible societal impact. By applying computer vision expertise to critical healthcare and pharmaceutical challenges, participants will engage with tasks like multi-modal data fusion, enhancing explainability in biomedical applications, and addressing the unique complexities of biological imaging, such as sparse or noisy datasets.x000D
x000D
This workshop is aligned with CVPR’s growing emphasis on “AI for social good,” offering computer vision researchers a platform to contribute to advances in medical science that could improve the lives of millions. It is a call to action for interdisciplinary innovation, uniting diverse expertise to tackle some of the most critical challenges in global health.

Workshop

The Seventh Workshop on Precognition: Seeing through the Future

Khoa Luu · Nemanja Djuric

12:00 PM - 5:30 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.

One example application is in autonomous driving technologies, where vision-based understanding of a traffic scene and prediction of movement of traffic actors is a critical piece of the autonomous puzzle. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been the focus of new theoretical studies and practical applications as much as detection and recognition problems.

Through the organization of this workshop, we aim to facilitate further discussion and interest within the research community regarding this nascent topic. This workshop will discuss recent approaches and research trends not only in anticipating human behavior from videos but also precognition in multiple other visual applications, such as medical imaging, healthcare, human face aging prediction, early event prediction, autonomous driving forecasting, etc.

Workshop

Visual Modeling Challenges for 2D-3D Virtual Try-On

Akshay Gadi Patil · Vidya Narayanan · Haoye Dong · Gerard Pons-Moll · Ming Lin

12:30 PM - 5:00 PM

Workshop

2nd Workshop on Efficient and On-Device Generation (EDGE)

Felix Juefei-Xu · Tingbo Hou · Yang Zhao · Licheng Yu · Zhisheng Xiao · Xiaoliang Dai · Qifei Wang · Tao Xu · Yanwu Xu · Ali Thabet · Qiang Liu · Xuan Ju · Ruiqi Gao · Xi Yin · Haolin Jia · Xide Xia · Peizhao Zhang · Peter Vajda

12:30 PM - 5:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Multi-modal Learning for Materials Science

Weike Ye · Santosh Suram · Helge Stein · Mathew Cherukara · Jiarui Zhang

12:30 PM - 5:00 PM

Workshop

8th Workshop and Competition on Affective & Behavior Analysis in-the-wild

Dimitrios Kollias · Stefanos Zafeiriou · Irene Kotsia · Panagiotis Tzirakis · Eric Granger · Simon Bacon · Marco Pedersoli · Alan Cowen

1:00 PM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

The ABAW Workshop is a premier platform highlighting the latest advancements in multimodal analysis, generation, modeling, and understanding of human affect and behavior in real-world, unconstrained environments. It emphasizes cutting-edge systems that integrate facial expressions, body movements, gestures, natural language, voice and speech to enable impactful research and practical applications. The workshop fosters interdisciplinary collaboration across fields such as computer vision, AI, human machine interaction, psychology, robotics, ethics & healthcare. The workshop further addresses complex challenges like algorithmic fairness, demographic bias & data privacy, making it a vital forum for building equitable, generalizable & human-centered AI systems. By uniting experts from academia, industry & government, the workshop promotes innovation, drives knowledge exchange, and inspires new directions in affective computing, behavior modelling and understanding & human-computer interaction. Finally, the Workshop includes a Competition with 6 challenges, including valence-arousal estimation, basic & compound expression recognition, action unit detection, emotional mimicry intensity estimation and ambivalence/hesitancy recognition.

Tutorial

Full-Stack, GPU-based Acceleration of Deep Learning and Foundation Models

Jason Clemons, Hongxu (Danny) Yin, and Xinglong Sun

1:00 PM - 5:00 PM

This tutorial offers insights across the hardware-software stack to accelerate deep neural networks, from convolutions to multimodal LLMs. Attendees will learn practical tools and trade-offs to optimize performance and inspire the next generation of scalable acceleration techniques.

Workshop

First Workshop on Experimental Model Auditing via Controllable Synthesis (EMACS)

Viraj Prabhu · Prithvijit Chattopadhyay · Sriram Yenamandra · Hao Liang · Krish Kabra · Guha Balakrishnan · Judy Hoffman · Pietro Perona

1:00 PM - 5:00 PM

With the increasing adoption of machine learning models in high-stakes applications, rigorous audits of model behavior have assumed paramount importance. However, traditional auditing methods fall short of being truly experimental, as they rely on wild-caught observational data that has been manually labeled. Enter generative techniques, which have recently shown impressive capabilities in automatically generating and labeling high-quality synthetic data at scale. Critically, many such methods allow for the isolation and manipulation of specific attributes of interest, paving the path towards robust experimental analysis. x000D
x000D
x000D
This workshop is dedicated to exploring techniques for auditing the behavior of machine learning models – including (but not limited) to performance, bias, and failure modes – by the controlled synthesis (via generation or simulation) of data. Of special interest are algorithms for generating data (images, text, audio, etc.) and benchmarking that provide reliable insights into model behavior by minimizing the impact of potential confounders. We also welcome work on the broader topic of using synthetic or quasi-synthetic data for model debugging, broadly construed, with the goal of providing a venue for interdisciplinary exchange of ideas on this emerging topic.

Workshop

Open-World 3D Scene Understanding with Foundation Models

Francis Engelmann · Ayça Takmaz · Jonas Schult · Alexandros Delitzas · Elisabetta Fedele · Zuria Bauer · katerina adam · Or Litany · Federico Tombari · Marc Pollefeys · Leonidas Guibas

1:00 PM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Domain Generalization: Evolution, Breakthroughs, and Future Horizons

Muhammad Haris Khan · Biplab Banerjee

1:00 PM - 5:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

AI for Creative Visual Content Generation, Editing and Understanding

Ozgur Kara · Fabian Caba Heilbron · Anyi Rao · Victor Escorcia · Ruihan Zhang · Mia Tang · Dong Liu · Maneesh Agrawala · James Rehg

1:00 PM - 6:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Visual content creation is booming, yet producing engaging visual content remains a challenging task. This workshop aims to highlight machine learning technologies that accelerate and enhance creative processes in visual content creation and editing, including image animation, text-to-visual content generation, and content translation. Moreover, we believe that advancing technology to better understand edited visual content can enable novel platforms for creating compelling media. We seek to bridge the gap between technical and creative communities by bringing together researchers from computer vision, graphics, and the arts, fostering interdisciplinary collaboration and exploring opportunities in this under-explored area.

Workshop

The first Workshop on Enforcing Geometric, Physical, Topological, and Functional Inductive Bias in 3D Generation

Qixing Huang · Congyue Deng · Lin Gao · Hanwen Jiang · Lingjie Liu · Biao Zhang · Ruqi Huang · Anand Bhattad · Roni Sengupta · Despoina Paschalidou

1:00 PM - 5:30 PM

Workshop

The 6th International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture in conjunction with IEEE CVPR 2025

Chris Padwick · Naira Hovakimyan · Jing Wu · Kai Wang

1:00 PM - 5:00 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

With the recent success of computer vision and deep learning in various applications, there has been significantly increasing attention towards its use in agriculture, presenting both significant economic and social opportunities. This The 6th Annual International Workshop And Prize Challenge on Agriculture-Vision aims to foster research and applications at the intersection of computer vision and agriculture, addressing challenges in real-world agricultural scenarios, with a strong record from prior editions at CVPR 2020-2024. The workshop will feature a computer vision challenge, and invited speakers from diverse academic and industry backgrounds including computer vision, robotics, agriculture, and top industry practitioners. This event provides a platform to showcase current progress in interdisciplinary areas and encourage further research and development of Foundation Models In Agriculture.

Workshop

11th IEEE International Workshop on Computer Vision in Sports

Rikke Gade · Anthony Cioppa · Thomas B. Moeslund · Graham Thomas · Adrian Hilton · Jim Little · Michele Merler · Silvio Giancola

1:00 PM - 5:15 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Sports is said to be the social glue of society. It allows people to interact irrespective of their social status, age etc. With the rise of the mass media, a significant quantity of resources has been channeled into sports in order to improve understanding, performance, and presentation. For example, areas like performance assessment, which were previously mainly of interest to coaches and sports scientists are now finding applications in broadcast and other media, driven by the increasing use of on-line sports viewing which provides a way of making all sorts of performance statistics available to viewers. Computer vision has recently started to play an important role in sports as seen in for example football where computer vision-based graphics in real-time enhances different aspects of the game. Computer vision algorithms have a huge potential in many aspects of sports ranging from automatic annotation of broadcast footage, through to better understanding of sport injuries, coaching, and enhanced viewing. So far, the use of computer vision in sports has been scattered between different disciplines. The ambition of this workshop is to bring together practitioners and researchers from different disciplines to share ideas and methods on current and future use of computer vision in sports.

Workshop

Workshop on Perception for Industrial Robotics Automation

Vage Taamazyan · Aarrushi Shandilya · Agastya Kalra · Huaijin Chen · Krzysztof Choromanski · Martin Sundermeyer · Phil Nelson · Satya Mallick · Stan Birchfield · Tim Salzmann · Tomas Hodan · Yang Qian

1:00 PM - 6:00 PM

This workshop addresses the gap between cutting-edge computer vision research and its practical application in industrial robotics, specifically addressing challenges in tasks like reliable, scalable, and cost-effective bin picking. The workshop brings together researchers and practitioners to discuss topics including 3D scene understanding, embodied AI, and robot learning, focusing on developing robust solutions by considering factors like embodiment, camera choice, and data needs. Complementing the workshop, the Perception Challenge for Bin Picking offers a practical platform for participants to tackle real-world 6DoF pose estimation problems using a robot-in-the-loop evaluation, providing a more realistic performance assessment than traditional vision-only metrics. The workshop and challenge together aim to accelerate the adoption of vision-guided robotics and enhance industrial automation efficiency.

Tutorial

Multimodal Mathematical Reasoning: Frontiers in Integrating Vision, Language, and Symbolic Representations

Tianyu Yang

1:00 PM - 5:00 PM

This tutorial surveys the growing field of multimodal mathematical reasoning, combining CV, NLP, and symbolic logic. It addresses diagram interpretation, symbolic notation, and multi-step logic. Attendees will explore datasets, models, and evaluation, and discuss applications in education and science.

Tutorial

Intelligent Healthcare based on Cameras and Wireless Sensors

Wenjin Wang · Daniel McDuff

1:00 PM - 5:00 PM

This tutorial explores contactless health monitoring using cameras and RF sensors. Topics include measuring vital signs from skin or body imagery, emotion recognition, sleep staging, and activity recognition. It covers radar, WiFi, RFID, and acoustic-based RF sensing, highlighting multi-modal techniques that improve monitoring in healthcare, telemedicine, sports, and driver safety.

Tutorial

Recent Advances in Vision Foundation Models

Zhengyuan Yang

1:00 PM - 5:00 PM

This tutorial covers cutting-edge developments in vision foundation models. Topics include multimodal understanding and generation, scaling test-time compute, and applications for physical and virtual agents. The session will provide insights into the design and future directions of vision-based foundation models.

Tutorial

Power-efficient neural networks using low-precision data types and quantization

Thomas Pfeil

1:00 PM - 5:00 PM

As neural networks grow, sustainability and cost become major challenges. This tutorial covers low-precision data types, quantization methods, and hands-on applications. Attendees will gain tools to maintain model performance while optimizing for efficiency on edge and large-scale deployments.

Tutorial

Identifying Structure in Data: All you need to know about Dimensionality Reduction, Clustering and more

Constantin Seibold

1:00 PM - 5:00 PM

This tutorial explores techniques for dataset curation, quality monitoring, dimensionality reduction (t-SNE, UMAP, h-NNE), and clustering (k-means, DBSCAN, FINCH). Attendees will learn how to use these methods to understand structure, reduce bias, detect outliers, and improve performance in AI and CV workflows.

Workshop

VizWiz Grand Challenge

Danna Gurari

1:00 PM - 5:30 PM

Workshop

The 4th Workshop on Transformers for Vision

Gedas Bertasius · Rohit Girdhar · Zhiding Yu · Lucas Beyer · Gul Varol · Alaaeldin El-Nouby · Tyler Zhu · Feng Cheng · Yan-Bo Lin · Md Mohaiminul Islam · Yi-Lin Sung · Jaemin Cho · Ce Zhang · Yue Yang · Ziyang Wang · Mohit Bansal · Shilong Liu · Hao Zhang · Fuxiao Liu · Xiaolong Li · Subhashree Radhakrishnan · Shiyi Lan · Jose M. Alvarez

1:20 PM - 6:00 PM

Workshop

Workshop on 3D Human Understanding

Qianli Ma · Siwei Zhang · Shashank Tripathi · Rawal Khirodkar · Yan Zhang · Yao Feng · Georgios Pavlakos · Siyu Tang

1:30 PM - 6:30 PM

Workshop

Catch UAVs that Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle (UAV) in the Wild and the 4th Anti-UAV Workshop & Challenge

Jian Zhao · Jianan Li · Lei Jin · Miguel Bordallo Lopez · Liang Li · Xinyi Ying · Tianyang Xu · Yawen Cui · SADAF GULSHAD · Shin’ichi Satoh · Hongyuan Zhang · Jianshu Li · Jiaojiao Zhao · Zhi-Qi Cheng · Mengmi Zhang · Zaiping Lin · Miao Li · Zheng Wang · Zechao Li · Yunchao Wei · Junliang Xing · shen Jane · Qi Wang · Xuelong Li

1:30 PM - 5:00 PM

Workshop

ReGenAI: Second Workshop on Responsible Generative AI

Adriana Romero-Soriano · Reyhane Askari · Melissa Hall · Michal Drozdzal · Ye Zhu · Agata Lapedriza · Arantxa Casanova · Negar Rostamzadeh · Utsav Prabhu · Pinar Yanardag

1:30 PM - 5:45 PM

Though start and end times here are correct, detailed schedules here may not be complete or up to date. Please be sure to cross reference the workshop's website to verify workshop schedule details if they are available on the workshop's website. (Added by CVPR.)

Workshop

Physics-inspired 3D Vision and Imaging

Anagh Malik · Benjamin Attal · Dor Verbin · Xuaner Zhang · James Tompkin

1:45 PM - 5:30 PM

3D computer vision has become fundamental to technologies ranging from medical imaging to astronomy and from AR/VR to embodied intelligence. New sensors and imaging modalities like structured-light, time-of-flight, and light field microscopy are being developed to make 3D vision more tractable; but even with new types of sensor data, many problems in 3D vision tend to be ill-posed and hence to solve them we often rely on heuristics or data-driven priors. Unfortunately, these priors can fail in certain cases, especially for problems where ground truth data is not available, or for niche sensors where capturing large datasets is not feasible. A promising, but often overlooked, alternative is to incorporate knowledge of physics (e.g. physical light transport) into 3D computer vision algorithms, which can better constrain the solutions that they produce.

The goal of this workshop is to highlight work in 3D computer vision and imaging that makes use of physics-inspired modeling and physical-priors, showcasing their importance even with the prevalence of neural priors and big data. Examples include methods that apply physics-based approaches to inverse rendering, 3D microscopy, tomography, and light-in-flight imaging; or methods that combine such approaches with novel tools like neural radiance fields (NeRFs), 3D Gaussian Splatting (3DGS), and generative image/video models.

Workshop

Real-to-Sim: Bridging the Gap between Neural Rendering and Robot Learning

Wayne Wu · Bolei Zhou · Katerina Fragkiadaki

1:45 PM - 5:30 PM

Workshop

(4th) Monocular Depth Estimation Challenge

Matteo Poggi · Fabio Tosi · Ripudaman Singh Arora · Anton Obukhov · Jaime Spencer · Chris Russell · Simon Hadfield · Richard Bowden

2:00 PM - 6:00 PM