Skip to yearly menu bar Skip to main content


Timezone: America/Denver
Filter Events
Tutorial
9:00 AM - 1:00 PM

Checkout-free retail represents one of the most challenging real-world computer vision deployments, requiring reliable performance across hundreds of sites and millions of interactions. This tutorial bridges academic research and production deployment by framing checkout-free retail as a canonical large-scale multi-camera vision systems problem. The tutorial focuses on three foundational pillars—presented as generalizable computer vision problems rather than application-specific solutions. Each component is discussed in terms of its underlying formulations, scalability constraints, failure modes, and design tradeoffs that transfer directly to autonomous driving, smart spaces, sports analytics, warehouses, and urban sensing. Automatic Multi-Camera CalibrationContinuous online estimation addressing drift and partial failures using deep learning and conventional CV pipelines.Real-Time Multi-Camera TrackingGlobal data association under asynchronous, unreliable observations via integer programming and graph-based formulations.Structured Event DetectionInference from partial visual evidence with edge computing, Kubernetes deployment, and CPU/GPU/TPU optimization. A central theme is how infrastructure constraints—including limited bandwidth, latency requirements, camera reliability, and edge computing budgets—fundamentally shape algorithmic and architectural decisions. Attendees will learn how classical and modern deep learning techniques are adapted for continuous online operation, partial observability, and 99.9%+ system reliability.

... more
Tutorial

Analytic understanding of diffusion models

Artem Lukoianov · Chenyang Yuan · Christopher Scarvelis · Mason Kamb
9:00 AM - 6:00 PM

This tutorial is designed to provide the required mathematical background to develop the intuition behind generalization of deep diffusion models. Starting from the fundamentals of diffusion, we build up to the current understanding of their generalization mechanisms in the most recent papers in the field.

... more
Tutorial
9:00 AM - 1:00 PM

Recent advances in artificial intelligence (AI) have significantly transformed medical imaging, enabling substantial progress in image acquisition, reconstruction, diagnosis, prognosis, and clinical decision support. Advances in deep learning, foundation models, multimodal integration, and generative modeling have improved accuracy and robustness across modalities such as MRI, CT, X-ray, ultrasound, and digital pathology. Despite these successes, critical challenges remain, including limited generalizability and interpretability, data heterogeneity, privacy concerns, and barriers to real-world clinical deployment. This tutorial provides a concise and up-to-date overview of recent advances in AI for medical imaging, reviewing key paradigms such as physics-informed and interpretable learning, privacy-preserving collaborative learning, and open-source medical imaging foundation models. We further discuss open challenges and future research directions shaping the next generation of medical imaging AI.

... more
Tutorial

Millimeter-wave (mmWave) signals, such as those used in 5G, 6G, and next-generation WiFi,
are a unique modality that can travel through many everyday occlusions (e.g. cardboard, fabric, fog, etc), allowing them to sense objects or scenes that are hidden from view. This unique capability has sparked recent interest in the computer vision community and beyond for using these signals to enable novel perception tasks with applications spanning autonomous driving, robotics, shipping and logistics, and more. The goal of this tutorial is to introduce audience members to this modality, and equip them with the knowledge needed to begin research in this area. We will cover both fundamental millimeter-wave imaging concepts, as well as recent, state-of-the-art methods. We will discuss different applications of millimeter-wave sensing, including a deep-dive into two areas: through-occlusion 3D object reconstruction, and all-weather scene understanding. We will additionally cover existing datasets, benchmarks, and tools so that audience members new to this area can begin research in this field. This tutorial is designed to be accessible for an audience with no prior millimeter-wave experience. Topics to be covered include:How millimeter wave signals differ from visible lightHow millimeter wave signals differ from other through-occlusion modalities (e.g., X-Ray, Ultrasound, etc)Various applications of mmWave sensing, including how they have been used in CV communityClassical methods for using mmWave signals to produce a 2D or 3D imageLimitations of classical mmWave imagingState-of-the-art methods for using mmWave signals to perform surface normal estimation for 3D object reconstructionState-of-the-art methods for using mmWave signals for complete scene reconstruction, segmentation, and object detectionHow researchers can get started in this area, including existing datasets, benchmarks, and tools

... more
Tutorial

All You Need To Know About Self-Driving

Raquel Urtasun · Abbas Sadat · Sivabalan Manivasagam · Jingkang Wang · Ioan Andrei Barsan
9:00 AM - 6:00 PM
Tutorial
9:00 AM - 1:00 PM

The robotics and Physical AI space has been a strong and growing topic at CVPR, especially with computer vision advancements in VLM and VLA models that have become key research areas in recent years. The community has developed several Vision-Language-Action models such asGR00T,π0,OpenVLA,SmolVLA, andACT.However, building a complete robotics pipeline, from data collection to model training to deployment, remains a challenging multi-disciplinary endeavor. Data collection often requires expensive hardware and software solutions, which has prohibited many researchers from pursuing this path. Foundation models require careful architecture design and post-training strategies. And deploying models on edge devices demands hardware-aware optimizations to achieve real-time performance.This tutorial bridges these gaps by providing a hands-on, end-to-end walkthrough of the full Physical AI stack. By the end, attendees will understand the high-level frameworks, tools, and open-source community activities around robotics, embedded devices, and model training, enabling researchers, industry partners, and communities worldwide to improve collaborations in this complex and growing field.

... more
Workshop
9:00 AM - 6:00 PM
Workshop

Multimodal Algorithmic Reasoning Workshop

Anoop Cherian ⋅ Suhas Lohit
9:00 AM - 1:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop
Workshop

9th Multimodal Learning and Applications Workshop

Paolo Rota ⋅ Michael Ying Yang
9:00 AM - 6:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop

6th Omnidirectional Computer Vision Workshop

Pierre Moulon ⋅ Guillaume Caron
9:00 AM - 1:00 PM
Workshop

Personalization in Generative AI Workshop

Pinar Yanardag ⋅ Nupur Kumari
9:00 AM - 1:00 PM
Workshop
Workshop
9:00 AM - 1:00 PM
Workshop
Workshop

Open-World Vision

Shu Kong ⋅ Neehar Peri
9:00 AM - 1:00 PM

Open-World Vision (OWV) emphasizes realistic opportunities and challenges in developing and deploying computer vision systems in the dynamic, vast, and unpredictable real open world, which offers abundant data that can benefit training and challenge testing. It contrasts the traditional "closed-world" paradigm of visual learning and inference, which assumes fixed, known data distributions and categorical labels. Models developed under such closed-world assumptions tend to be brittle when encountering ever-changing and novel scenarios in the real open world. Modern visual learning has shifted towards an open-world paradigm, such as pretraining foundation models on massive data sourced from the open world (e.g., web-sourced data). While these models show unprecedented performance and strong adaptability to downstream tasks, they inherit biases from their open-world pretraining data and can still fail in truly novel or underrepresented scenarios during deployment. This workshop aims not only to uncover current limitations, potential risks, emerging opportunities, and unresolved challenges of open-world vision, but also to solicit solutions that advance the field toward more robust, fair, and adaptable visual systems.

... more
Workshop

4th Workshop on Maritime Computer Vision

Benjamin Kiefer ⋅ Jon Muhovic
9:00 AM - 1:00 PM
Workshop
9:00 AM - 1:00 PM
Workshop

Exploring the Next Generation of Data

Nadine Chang ⋅ Maying Shen
9:00 AM - 1:00 PM
Workshop
9:00 AM - 1:00 PM
Workshop

PhysHuman: Physically Grounded Human Perception and Modeling

Feng Liu ⋅ Youngjoong Kwon ⋅ Cheng Zhang
9:00 AM - 1:00 PM
Workshop
9:00 AM - 1:00 PM
Workshop

Workshop on Visual Concepts

Joy Hsu ⋅ R. Kenny Jones
9:00 AM - 6:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop
Workshop
9:00 AM - 1:00 PM
Workshop
Workshop

How Do Vision Models Work?

Tamar Rott Shaham ⋅ Amil Dravid
9:00 AM - 6:00 PM
Workshop

Sight and Sound

Andrew Owens ⋅ Jiajun Wu
9:00 AM - 6:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop
9:00 AM - 1:00 PM
Workshop
Workshop

4th Workshop on Generative Models for Computer Vision

Adam Kortylewski ⋅ Fangneng Zhan
9:00 AM - 6:00 PM
Workshop
Workshop

Third Workshop for Learning 3D with Multi-View Supervision

Abdullah J Hamdi ⋅ Silvio Giancola
9:00 AM - 1:00 PM
Workshop

Safe Artificial Intelligence for All Domains

Oliver Wasenmüller ⋅ Markus Enzweiler
9:00 AM - 1:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop

Humans of Generative AI

Jaron Mink ⋅ David Forsyth
9:00 AM - 1:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop

2nd Workshop on Video Large Language Models

Rohit Gupta ⋅ Sirnam Swetha
9:00 AM - 6:00 PM
Workshop
9:00 AM - 1:00 PM
Workshop
9:00 AM - 6:00 PM
Workshop

Workshop on Any-to-any Multimodal Learning

Shengqiong Wu ⋅ Wei Dai
9:00 AM - 1:00 PM
Tutorial

Foundations and Frontiers of Watermarking: Algorithms, Multimodal Extensions, Benchmarks, and Authenticity Frameworks

Vishal Asnani · Shruti Agarwal · Benedetta Tondi · Pierre Fernandez · Furong Huang
1:00 PM - 6:00 PM

Invisible watermarking has re-emerged as a critical pillar of trustworthy AI and media authenticity in the era of generative models. This tutorial provides a unified end-to-end treatment of watermarking, spanning classical signal-processing theory, modern deep-learning methods, multimodal extensions (image, video, audio, 3D), robustness benchmarking (WAVES), and real-world deployment within provenance ecosystems such as C2PA and Content Credentials.

... more
Tutorial

From Perception to Action: Building Efficient and Deployable Robot Intelligence Pipelines with Open-Source Edge AI Toolkits

Samet Akcay · Zhuo Wu · Michael Paulitsch · Ashutosh Kumar · Tao Xiong · Adrian Boguszewski · Sameer Sheorey · Benjamin Ummenhofer
1:00 PM - 6:00 PM

Resource for CVPR 2026 tutorial on "From Perception to Action: Building Efficient and Deployable Robot Intelligence Pipelines with Open-Source Edge AI Toolkits"

... more
Tutorial

The Road to Convergence: Evolution of Unified Multimodal Models

Jindong Wang · Hao Chen · Jiakui Hu · Zhaolong Su · Sharon Li
1:00 PM - 6:00 PM

Tracing the evolution of multimodal AI from isolated expertise to Unified Multimodal Models. We introduce the core motivations driving unification — particularly the mutual reinforcement between understanding and generation — and provide a rigorous definition of UMMs.

... more
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

Appearance Understanding and Generation

Elena Garces ⋅ Giuseppe Vecchio
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

Artificial Intelligence for Space

Daniele Gammelli ⋅ Gabriele Meoni
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

2nd Workshop on GenAI for Storytelling

Andrew Shin ⋅ Yusuke Mori
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
Workshop

CVPR 2026 Biometrics Workshop

Bir Bhanu ⋅ Ajay Kumar
1:00 PM - 6:00 PM
Workshop

Big Model Adaptation In Computer Vision

Yuki Asano ⋅ Anna Kukleva
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

The 7th International Workshop on Eye and Gaze in Computer Vision

Yihua Cheng ⋅ Seonwook Park ⋅ Hyung Jin Chang
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

Imagine a world where computer vision-based systems can analyze a video of an athlete, a surgeon, a patient, or a factory worker and instantly provide expert-level actionable feedback---correcting techniques, identifying inefficiencies, and helping people refine their skills in real time. Thanks to rapid progress in video understanding, this vision is becoming reality. AI-powered systems can now analyze complex human activities, assess performance, and generate intelligent feedback, unlocking new possibilities in sports, healthcare, manufacturing, education, rehabilitation, and beyond. Through Expert Keynotes and Invited Contributions, this CVPR 2026 workshop will explore the cutting edge of skilled activity understanding, assessment, and feedback generation, bridging research and real-world applications.

As AI systems become more capable of understanding human expertise, the implications are profound---empowering individuals with personalized coaching, democratized skill development, and scalable training solutions. We invite researchers, industry leaders, and practitioners to join us in shaping the future of AI-powered skill understanding. Whether working on foundational research, applied solutions, or real-world deployment, this workshop is an opportunity and forum to learn about and push the boundaries of how AI perceives, evaluates, and enhances human ability.

... more
Workshop
Workshop
1:00 PM - 6:00 PM
Workshop

Visual Anomaly and Novelty Detection - 4th Edition

Philipp Seeböck ⋅ Latha Pemula
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM
Workshop

1st Workshop on Generative 3D Reconstruction

Daniel Barath ⋅ Fabian Manhardt
1:00 PM - 6:00 PM
Workshop
1:00 PM - 6:00 PM